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ABSTJiACI 



This study began with the psychological postulate that all 
human performance in a choice si tuition tends to bo made on a 
systematic basis. 

In the setting of multiple choice achievement tests, this 
postulate resolved itself into three operational hypotheses which form 
a necessary and sufficient set to establish the possibility that it 
applied to both the right and the wrong answers given by examinees. 
Testing these three hypotheses involved the following procedures: 

1) To develop and logically validate a systematic method for 
the construction of the foils (d i s tractors ) on a multiple 
choice achievement test designed tc measure higher mental 
processes • 

2) To show that the construct validity of this systematic 
method held up reasonably well in the results of the 
administration of this test. 

3) To show that the validly produced foils in this context 
improved the predictive validity of this test with respect 
to other achievement toots over tin? more usual procedures. 

The results of the study tended in general to support the^e 
three hypotheses fairly strongly if we take into account the finding 
that many of the foils could be classified into more than one category 
as evident in the low interrater reliability, the need to reclassify 
foils when wrong answer patterns wore being interpreted, and the manner 
in which these interpreted foil categories cross-validated. This study 
would seem to have produced three fairly definite findings: 



1. Human performance , when abstracted from responses to multiple 
cho:i oe achievement tests involving higher mental processes, 
would seem to be systematic, and to display evidence of 
multiple interpretation of the communication « 

2* There would seem to be a hierarchy of foils which parallels 
the hierarchy of right answers and which influences the way 
-in which pfich total ite;*! "^erfirns < The levels of the foils 
themselves seem to depend upon the ways in which this 
totality of each item is approached 0 

3o Wrong answers contain potentially useful information with 
respect to achievement when higher mental processes are 
involved 0 

Taxonomic tests would seem to have a number of properties not 
assumed to be present when the lest is sufficiently homogeneous to be 
assumed to form a scale. The existence of these properties made it 
fairly evident that the more commonly used analytic procedures were 
probably inappropriate for the analysis and interpretation of the results 
from, and the criteria for, evaluating the effectiveness of this type of 
teste The alternative analytic procedure which the findings of the study 
implied were organized into a suggestion for the extension of test theory 
designed to deal with the problems which seem to arise where taxonomic 
tests are concerned 0 

Some implications of the findings to educational practice were 
drawn, and a number of suggestions for future research into this area 
were presented „ 
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ABSTRACT 



This study began with the psychological postulate that all 
human performance in a choice pi tuition tends to "be made on a 
systematic basis. 

In the setting of multiple choice achievement tests, this 
postulate resolved itself into three operational hypotheses which form 
a necessary and sufficient set to establish the possibility that it 
applied to both the right and the wrong answers given by examinees. 
Testing these three hypotheses involved the following procedures: 

1) To develop and logically validate a systematic method for 
the construction of the foils ( distractors ) on a multiple 
choice achievement test designed to measure higher mental 
processes • 

2) To show that the construct validity of this systematic 
method held up reasonably well in the results of the 
administration of this test. 

3) To show that the validly produced foils in this context 
improved the predictive validity of this test v/ith respect 
to other achievement tests over the more usual procedures. 

The results of the study tended in general to support these 
three hypotheses fairly strongly if we take into account the finding 
that many of the foils could be classified into more than one category 
as evident in the low interrater reliability, the need to reclassify 
foils when wrong answer patterns were being interpreted, and the manner 
in which these interpreted foil categories cross-validated. This study 
would seem to have produced three fairly definite findings: 



1. Human performance, when abstracted from responses to multiple 
choice achievement teste, involving higher mental processes, 
would seem to bo systematic, and to display evidence of 
multiple interpretation of the communi cation. 

2c. There would seem to be a hierarchy of foils which parallels 
the hierarchy of right answers and which influences the way 
in vrhi ch pnch tot;?/ ' t«y: ^orfomse The levels of the foils 
themselves seem to depend upon the ways in which this 
totality of each item is approached „ 

3o Wrong answers contain potentially useful information with 
respect to achievement v>hen higher mental processes are 
involved „ 

Taxonomic tests would seem to have a number of properties not 
assumed to be present when the test is sufficiently homogeneous to be 
assumed to form a scale. The existence of these properties made it 
fairly evident that the more commonly used analytic procedures were 
probably inappropriate for the analysis and interpretation of the results 
from, and the criteria for, evaluating the effectiveness of this type of 
test 0 The alternative analytic procedure which the findings of the study 
implied were organized into a suggestion for the extension of test theory 
designed to deal with the problems which seem to arise where taxonomic 
tests are concerned 0 

Some implications of the findings to educational practice were 
drawn, and a number of suggestions for future research into this area 
were presented „ 
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CHAPTER I 
INTRODUCTION 

Psycho] ogi cal literature has been replete with studies of choice 
behavior, Chown (1959), Duncan (1959) and more recently Hunt (l96l) and 
Berlyne (1965) have reviewed this literature adequately. From these 
studies it is fairly evident that the distribution of choices made by 
humans in problem-solving situations tends to exhibit some systematic 
trends . 

These trends, however, have often been confined to discussion in 
terms of the patterns of "success" when compared with the nature and 
complexity of the task. Many studies, for example Strut z (1966) con- 
centrate mainly on "right" answers and the relevant patterns involved. 
A notable second direction in tbj s area has been the attempts to 
describe the nature of the procedure used by the person in his attempt 
to solve problems; for instance, Piaget (1953) proposed a logical 
system for these procedures. Abel son and Rosenberg (1958) proposed a 
logical system which they call "psychologic" which proposes a set of 
"logical" procedures which lead to certain types of "wrong" answers. 

Specifically, within the area of achievement testing the 
multiple choice test provides a good opportunity to observe choice 
behavior in a problem-solving setting. With the advent of Bloom's 
Taxonomy (1956) a considerable improvement in the classification of 
test items involving these "higher mental processes" became possible. 

This present study will, be to i) demonstrate the presence or 



Furth (i9u9) discusses the Piaget model in some detail. 



2 

absence of this additional information, 2) identify the general 
properties, if any, of this information, and 3) speculate as to the 
implications of such findings. Assuming systematic choice behavior, 
consistencies would be expected within and between individuals for all 
choices made. These consistencies may not be confined to the 
"successes" or "right" answers. This study proposes to explore 
certain aspects of the possibility that the choices made among wrong 
answers may also be systematic and therefore contain useful information 
for the examiner c 

Statement of the Problem 

Some aspects of the possibility that wrong answers are selected 
systematically have been examined /Of Fouldes and Forbes, 1965; Powell. 
1968; Jacobs and Yande venter , 1968; Powell and Isbister, 19o9_/. and these 
studies are discussed in more detail in Chapter II 0 

The specific concern of this study is to show to the designers 
of tests the significance of wrong-answer information <> Any significant 
improvement in a test must be reflected in a corresponding improvement 
in the validity of the test. This study proposes to examine the 
construct and predictive validities of a particular method of test 
construction „ The purpose of this study, then, is to explore the 
possibility that, if tests are constructed in a particular manner, 
"wrong" answers may add to the examiner's information about the 
examinee. This present study will be content to demonstrate the 
presence or absence of this additional information to determine the 
major properties of this information, and to speculate as to the 
implications of such findings. 



CHAPTER II 

BACKGROUND FOR THE STUDY 

As already mentioned in Chapter I, this study was essentially 

exploratory. For this reason, little attempt has been made to 

establish a theoretical rationale upon which the hypothetical structure 

of the study might be built. Instead, the study was organized on the 

basis of procedural considerations. The absence of a theoretical 

rationale for the interpretation of results had the advantage that the 

data could be examined for consistent character] sties and the 
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properties of these characteristics. 

Of course the nature of the procedures employed provide 
definite limitations upon the interpretations. The findings themselves 
would tend to provide other limitations upon the interpretation, and 
also the generalizability of the findings. 

It is the purpose of the present chapter to review the most 
significant research which is relevant to the present study in order to 
present the research background which forms the basis for the 
procedural considerations which were employed. 

The Problem of M easurin g Acadom ic Performance 

The functions of, and therefore the outcomes of, education are 
a subject of debate which is beyond the scope of this dissertation., 
However, these functions and their corresponding outcomes have an 
important bearing on the nature of and the interpretations given to 



"Glaser and Strauss (196?) developed a complete rationale to 
justify this procedure for exploratory studies. 



k 

the results of the various kinds of measuring instruments usedo 

Tests are formalized communications between examiner and 
examinee o The examiner is attempting to obtain a controlled sample of 
behavior to assist him in the rendering of certain judgments. These 
judgments may be either value judgments (continue, withdraw, certificate); 
or they may be procedural (concerning the nature of the appropriate treat- 
ments of programmes). The greatest possible control of the behavior 
sample is found in the multiple choice test. 

As communications, tests involve several important considerations 0 
First, they involve the examiner's perception of the examinees in terms 
of the capabilities they do have, and the capabilities they should have 
(educational goals) „ These perceptions lead to the examiner's decisions 
as to which information to give in the examination, in what format, and 
which information to withhold. 

Second, on the basis of these considerations a communication is 
formulated. For the purpose of this study these communications will be 
confined to the multiple choice achievement teste 

Third, the communication is presented to the examinee who is 
expected to interpret it and to respond to it 0 He will do so on the 
basis of the capabilities he possesses; the Information he possesses, 
particularly that part of the information pool which was withheld by the 
examiner; and his sense of the importance to him, as a person, of the 
answers he gives including the information about himself which he wishes 
to withhold o 

Fourth, the examiner then has the task of interpreting the 
responses of the examinees and making such value or procedural judg- 
ments as may be appropriate to these interpretations and to the 
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purpose of the test. To form these judgments the examinee's performance 
can be compared, as appropriate, with l) his own past performance in 
similar contexts, 2) the performance of others in the same context 
(norm referencing) or 3) some external behavioral definition of mastery 
(criterion referencing) . 

However, where the subject matter content is itself open to 
disagreement, examiners themselves may not agree as to the appropriate- 
ness of the communication or its interpretation. Also, the examiner's 
assumptions ah out the capabilities and information background of the 
examinees may not be congruent to their actual characteristics. Further- 
more, there may be little similarity bet\<reen the examiner's purposes 
and the examinee's interpretation of bhese purposes „ In addition, if an 
examinee has systematically misclassified a particular concept and this 
concept recurs with a. high degree of frequency in a test, the examinee 
is likely to obtain a low total-correct score. Given the opportunity to 
correct this mi sclassif 'icatien could lead to a much higher total score „ 
How serious, then, must a misclassification be considered? Finally, 
suppose that the examiner misclassifies? Such an event is bound to have 
an adverse effect on the total correct score of the profoundly informed 
student as Hoffman (1962 ) points out. The combined effects of such 
considerations upon the composition of total-correct scores complicates 
their interpretation. 

Technical Consi deration in the Measurement of Choice Behavior 

This study will confine itself to the choice behavior of 
examinees as exemplified in their responses to multiple choice achieve- 
ment tests. The particular point of view to be expressed is relative to 
the way in which current practice tends to use wrong-answer information „ 
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Current practi c<; for scorin ; -', richievciinent tests, Pi'esent 
practice for scoring multiple choice achievement tests is to count the 
number of "right" answers selected by the examinee on a test or a 
subtest. The "right" answers are usually predetermined although 
experience with particular items may lead to subsequent revisions. 
In such tests the examinee is faced with several alternatives only one 
of which is "right." This means that he can make a wrong choice among 
several alternatives. In general, however, distinctions which might be 
made .among students on the basis of differences among the wrong answers 
selected are not considered when the students' scores are evaluated. 
If wrong answers are used for any purpose it is usually to correct the 
scores for guessing. 

Current Practice for Evaluating Achievement Tests — Reliability 

There are three general areas in which the specific 
characteristics of a test can be improved. These are: 

1. Reliability 

2. Validity 

3. Useability 

The third characteristic of these, useability, can be dispensed 
with quickly because the simplicity of administration, and the 
simplicity and objectivity of scoring of multiple choice tests and the 
ease in the establishment and use of norms has been well established. 
The other two characteristics require more discussion. 

The reliab i lity of a test . The concept of test reliability 
involves how well the test measures whatever it measures. The APA 
Standards (1966) lists three methods of estimating reliability. 
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These are: 

1. Internal consistency 

2. Reliability between forms 

3. Reliability over time 

The latter two are determined by correlation coefficients either 
between alternative forms, or between repeated administrations of the 
same form on the same group of examinees. Neither of these two 
approaches is directly applicable to this study. 

There are several possible approaches to the study of internal 
consistency. These are: 

1. Item analysis 

2. Using a correction-f or-guessing 

3« Desi Piling the test to form a scale 

a. Examining internal characteristics of the test 

5« Part- whole comparisons 

Internal consistency from i tern analysis . There are two schools 
of thought with respect to item analysis. The classical approach 
assumes that all of the items on the test should measure the same over- 
all characteristic that the whole test, or that the relevant subtest, 
measures. To determine this similarity, the distribution of right 
answers on a particular item is correlated with the distribution of 
total- correct scores. The biserial correlation coefficient is usually 
used. This correlation coefficient actually correlates a dichotomous 
variable with a continuous variable. For the answers of a multiple 
choice item to be dichotomous, the plural set of wrong answers must be 
treated as a single variable. Similarly, the discrote distribution of 



total scores (the total scores are the suns of a binary vector of "right- 
wrong" decisions which sum mist be a whole number) must be treated as a 
continuous variable. The problem that score data provide discrete 
distribution is avoided by assuming that total scores are "best 
estimates" of "true scores" and true scores are assumed to form a 
continuous distribution. There is, of course, a multiserial correlation 
/Cf Jaspen, I9 l \6/ which could be used to take account of the plurality 
of "wrong" answers. This latter coefficient is rarely used to evaluate 
multiple choice test items. Classical test theory advocates that the 
hi serial correlation coefficient for each item should be high 
(significantly different from zero). 

An alternative approach suggested by Lord (1952) involves the 
considerations necessary for addition of scores. In order to add two 
numbers, they must be independent, that i.s, the set of lattice points 
each represents can share no elements in common. By this approach, 
individual items should be relatively uncorrelated , but should 
collectively form a scale. 

Both of those procedures tend to treat a multiple choice item 
as a dichotomy thus overlooking the fact that more than one choice can 
be made among the set of foils. 

Using a corrceti on-f or-guo s si n;r . Originally, guessing 
corrections for scoring formulae assumed that the number of right 
answers which are attributable to guessing were directly related to the 
number of wrong answers given and inversely related to the number of 
alternatives per item. Because any answer could be a "guess," no 
meaning could be ascribed to particular answers. Meaning was thus 



assumed to be confined to some fori:, of cumulative score. . This 
correction hao the effect of increasing the variance of the total scores 
because a greater amount is subtracted fron; the low scores than from the 
high ones. ' 

With respect to corrections-f or-guessing, Gupta and Penfold 
(l96l) showed that the guessing correction over-corrects in the event 
that the examinee is responding on the basis of misinformation. A 
similar argument can be presented to suggest that this correction under- 
corrects the partially- informed examinee. More recently, Shuford, and 
Massengill (1965) elaborated upon a system of "confidence scoring" in 
which the examinee rates every alternative 011 the basis of his con- 
fidence that each particular alternative is right. Honesty is 
encouraged on the basis that "confidently wrong" loses marks . This 
procedure makes it possible tc classify each examinee ' s answer to each 
question as l) well informed, 2) partially informed, 3) uninformed, 
and k) misinformed. This procedure solves the guessing correction 
problem by identifying which items were "guessed" thus increasing the 
interpretabi lity of particular items and hence the validity of the test. 
The scoring method these authors developed increases the internal con- 
sistency of the test by increasing the true score variance estimates 
proportionally more than the total score variance. 

Design: ng a t e s t to form a scale . The argument may be raised 
that the practice of distinguishing among individuals on the basis of 
total scores without consiaering the constituents of those scores may 
produce information loss. This argument can lead to the proliferation 
of subtests, or it can lead to test designs in which the scores form a 
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scale. For instance, Cox and Graha:ii (i960) propose a system for 
designing a test which uses Gagne'e task analysis /cf Gagne, 1S'65, 
Chapter TLl/ , to produce a Guttmann (195'+) scale.. In this case, the 
score may indicate the level of mastery. Again, the internal 
consistency of the test can be increased, this time by increasing 
i t em homogene i ty . 

Examining internal best charaeteristi eg . One other area which 
has led to improvements in the reliability of vests has been through 
research into the improvement of the definition of the variables being 
measured by a test. Research toward this objective has been more 
extensive in the area of personality tests than in the development of 
achievement tests. The design of personality tests is beyond the scope 
of this present study. In view of the scarcity of appropriate research 
from the achievement testing area, only two developments in this latter 
testing area will be discussed here. First, Ayers (1965) attempted to 
validate Bloom's Taxonomy by means of factor analysis from tetrachoric 
correlations using programmed instruction in order to control the 
teacher variable. His findings in general supported Bloom's notion of 
a hierarchical structure. However, the results did not consistently 
fit the classification system in the Taxonomy . A more ambitious study 
to this same end was conducted by Kropp, Stoker, and Bashaw (1966). 
Although their findings were similar to those of Ayers (1965) because 
of the illumination their study provides for the construction of 
taxonomic tests, it is discussed in detail on page 18ff. For our 
present purposes, it may be sufficient to say that the validity and 
the reliability of achievement tests may be improved by using Bloom's 
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Taxonomy as a guide for developing the Hems. 

Second. Gupta (i960) showed that the reliability in an internal 
consistency sense of an achievement test can be improved if the test is 
subdivided into subtests based on factor analytic results or on the 
basis of the DuBois, Loevinger and C-leser (195?:) method of cluster 
analysis. This procedure makes subtests from relatively homogeneous 
items. This present study used a similar approach. 

It should be noted, once again, that these methods tend to 
concentrate exclusively on the "right" answers. 

Relj abi I ; ty based on part - whol e comparisons . A special case 
of the alternative forms method of determining the reliability of the 
test is the group of procedures whi ch use the correlation of one part 
of the test with another. The mathematical limit of the repeated use 
of the split-half technique when certain assumptions are made is found 
in the Kuder- Richardson (K-R) formulae. It is this form of reliability 
which increases in the DuBois et al (1952) procedure. 

The Kuder- Richardson procedure is most sensitive to differences 
in the variance of the test. For this reason, if error variance is 
kept constant, increasing the test variance (as when using a 
correction-f or-gu.essing) , also increases the reliability. Another 
method of increasing the variance is to rev/rite the test in such a man- 
ner as to move the difficulty (selection ratio) of each item toward .5 
(50^). If the item is a dichotomy, a difficulty of .5 (50°/o) tended to 
maximize the variance assuming positive correlation because it 
maximizes the probability of choosing either alternative. 

It is a common suggestion in evaluation texts for example, that 
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items should have middle range of difficulty. This suggestion assumes 
that each item should be treated as a "right-wrong" dichotomy. The 
plurality of wrong answers is being overlooked when items are treated 
as a dichotomy. 

Current Pract Lees for Bvaluati ng Achievement T ests -- Vali d:i ty 

In addition to the reliability of a test it is also necessary 
to be sure that a test measures the things it is intended to measure, 
i.e. the validity of the test. The validity and the reliability are 
related in that the validity of a test can never be higher than the 
square root of the test's reliability when the latter is defined in 
repeated-measures terms, hence the efforts to increase test reliability. 
The APA Standards lists three types of validity. These are: 

1. Content validity 

2. Construct validity 

3. Criterion-related validity 

The concept of content validity refers to the validity of an 
item or test as dependent upon the appropriateness of the item or test 
of the information background needed to answer the test. In this 
present study most of the necessary information background needed is 
supplied in reading selections embedded in the test. 

The construct validity aspect of a test . Construct validity 
has several aspects. In brief, construct validity refers to some 
psychological construct or constructs, more or less independent of 
content, which are included in the test. For instance, if intelligence 
as a construct is assumed to be manifested by intelligence measures, 
such as the VI SC. the correlations between a new group I . Q» test and 
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the V/ISC scores on the same subjects could be support for the construct 
validity of the new test. In this caoe the construct would be 
"intelligence." Another approach to construct validity is to define 
the construct in such terms as to facilitate translation into perform- 
ance terms. A good example to this approach is Bloom's Taxonomy (1956). 
As already indicated, this procedure should increase the reliability of 
the test as well as its validity. These constructs should also be 
identifiable in the performance of examinees when the performance data 
are subjected to statistical analysis. 

Another procedure which strengthens support for constructs for 
which measures have not been standardized is "cross-validation." In 
cross-validation statistical analysis should reveal the same constructs 
in independent groups. Cross-validation is used in this study for 
testing the construct validity of the procedure being explored here for 
the development of foils (wrong alternatives) on multiple choice tests, 

A final aspect of construct validity concerns the degree to 
which the examiner's objectives have been accomplished by the test he 
has developed. In the absence of standards this accomplishment is 
difficult to measure. One approach is to study the distribution of 
answers to an item to get clues to its effectiveness. Fart of the 
discussion in Chapter III on the development of the experimental test 
used in this study will elaborate this procedure. 

In some cases the construct may be sufficiently well defined 
that the different performance outcomes are indisputable. In such 
cases the constrict validity of a test may be easily determined. 
Piaget's discussion of the acquisition of various aspects of 
conservation concepts are a case in point. Items measuring the 



Ik 

acquisition of these concepts must jonform in their discrimination to 
the known characteristics of this acquisition process. Where wrong 
answers are concerned, as wil'J be discussed on pare 27 if no such clear 
definition exists. The present study, therefore, can he no more than 
exploratory in nature. 

C r i t e r i o n - o i • i e n t e d aspects of a tost . One of the fundamental 
functions of any measurement of achievement is its predictive value for 
future achievement. Within the context of the present study one of the 
concerns is the ability of the experimental test which may. by its 
construct characteristics, be considered as a test of strategies which 
may improve the prediction of other achievement test results. Popham 
and Husek (1969) point out that most of the statistical procedures used 
in current practice may be inappropriate for criterion-referenced tests. 

Studies Related to Wron g Answers 

At this point the concept of answering patterns becomes 
critical. An answering pattern will, for present purposes, be defined 
as some characteristic among the answers selected by a group of 
students which is consistent and stable under statistical analysis, and 
hence leads to an improvement in the validity and reliability estimates 
of the test to which these answers are given. The works already quoted 
suggest that there may be such patterns among ''success" performance. 
The question can now be raised as to whether or not there may be 
answering patterns among wrong answers as well. 

A possible source of findings .concerning wrong answer informa- 
tion is diagnostic testing which has a considerable history. Schonnel 
(19^3) discusses the cumulative results of more than twenty years of 
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research. His procedures showed that the nature and location of 
mistakes can reveal specific problems, i.e. wrong .answers can be meaning- 
ful for diagnostic purposes,, However, error types are usually estab- 
lished in advance , and are restricted to items which reflect only one 
type of error thus allowing the items to be scored as right or wrong. 
Large numbers of questions are needed for this procedure since as the 
complexity of the problem to be solved increases, the number of tasks 
required to diagnose all possible errors increases expotentially ; prob- 
ably explaining the absence of diagnostic tests in "subjective" subjects. 
The Cox and Graham (1966) procedure refines this diagnostic technique. 
In order to develop diagnostic tests of this sort, an interlocking 
pattern of items is usually designed in such a way that specific weak- 
nesses in a particular student's performance can be inferred. This pro- 
cedure identifies weaknesses on the basis of relationships between items 
rather than relationships between alternatives within a particular item. 

It becomes evident from the fact that diagnosis can lead to the 
Identification of specific error types that the four categories of 
students' responses made by Shuford et al (l9"5) may be .an over- 
simplification. Furthermore, it would seem reasonable that more than 
one error type could be accommodated in one item if a multiple choice 
format were used. In this latter case, there should be evidence of 
answering patterns among wrong answers. 

Answering patterns in foil selection . The evidence supporting 
the possibility that there may be answering patterns in the wrong 
answers as well as the right ones is sparse. Sigel (1963) reported 
with reference to intelligence testing that children tend to "be 
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consistent within themselves in the errors they make /p. 5'iJ . " 

Ponldes and Forbes (1965) reported in the manual for their 
revision of the Advanced Set of Raven's Progressive Matrices the 



following finding concerning common errors: 



Four types of common errors could be identified. 
(a) Incomplete solutions. There were errors due to people 
failing to grasp all the variables determining the nature 
of the correct figure required to complete a test item. 
Instead they chose a figure which was right as far as it 
went but was only partly correct... (.13) Arbitrary lines 
of reasoning. Here the figure chosen suggests that the 
person has used a principle of reasoning qualitatively 
different from that demanded by the problem... (C) 
Over-determined choices. These were errors involving 
failure to discriminate irrelevant qualities in the figure 
chosen, o. (l)) Repetitions. These are error's made by 
people who simply selected a figure identical with one of 
the three figures in the matrix immediately adjacent to the 
space to be filled, /p. 2u7 

Fouldes and Forbes (1965) did not attempt to show whether or 
not these common errors were more characteristic cf some individuals 



than of others. 



These types of error would seem to be more related to some form 
cf answering procedure based on the relational characteristics cf the 
alternat i ves rather than on their informational characteristics. 

Powell (I.968) factor analysed some wrong answers derived from 
an administration of Gorham's Proverbs Test (1956). This test would 
probably be classified as a comprehension test by Bloom's Taxonomy . A 
wrong answer pattern of four factors resulted. These were: 

1. Reduction of information to affect a simplification of the 



statement 



2. Addition of irrelevant information 



3. Substitution of elements 



Replacement of proverb by one largely \mrelated 
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If this list -is rrompared with the one by Fouldee and Forbes on 
page 16. we find, at least by description, Factor 1 remarkably like 
their Class C (Overdeterauned choices). Possible relationships between 
the remainder are less certain although Factor k and Class B may be 
related. Their Class D is unlike Factor 3 but is very much like the 
"Word-Word Links" class present in the experimental test used in this 
study. A definition of this class is on page 33. 

However, Sigel ( 1 9 6 3 ) went on to report that there "seemed to 
be no relationship between type of error and total score." /p. 53_7 In 
contradiction to Sigel, Jacobs and Vandeventer (±968) showed that 
within the context of Raven's Coloured Prcgres si ve Matri ces and by 
using the Guttman and Schlesinger (1967) facet design, that a relation- 
ship often does exist between right and wrong answers. Ebel (1969) has 
shown similar systematic characteristics among True-False -terns. 
Furthermore, Powell and Isbister (1969) showed that some wrong answers 
can be related to right answers so as to adversely affect the high 
scoring students. The type of foil involved was the "irrelevancy." A 
definition of this class is on page 37. An inconclusive trend in this 
same direction was found in Factor k, page 17 . 

Thus, the available evidence, scanty though it is, suggests 
that neither "misinformation" nor "no information" (leading to a hap- 
hazard answer) are sufficient to account for all the wrong answers 
given to multiple choice achievement tests, and that in certain 
circumstances specific foils may influence the total-correct score. 

If wrong answers contain achievement information, then the 
wrong answers which display systematic characteristics which acceptably 
support the construct characteristics of the experimental test should 
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improve the prediction of independent achievement scores for the name 
examinees . 

This improvement should occur in comparison with the prediction 
made by either the total-correct scores on the experimental test or 
some reasonable subdivision of these scores into subtest scores where 
the subtests also fit the constrict characteristics of the test. 

Studies Related to 1 tern Genera tion 

Perhaps the most ambitious attempt to develop tests reflecting 
Bloom's Taxonom y (1956) was the work of Kropp, Stoker, and Bashaw 
(1966). These researchers encountered a number of problems in their 
work some of which are discussed here along with the alternative 
procedures used in the design of the experimental test used in this 
study. 

The problems they encountered which are relevant to this study 

are : 

1. Problems arising from the "Knowledge" category of the 
Taxonomy 

2. The generation of Synthesis and Evaluation category items 
in multiple choice format 

3. Item analysis p3-oblems 

k. Problems arising from implicit assumptions in their study 
Although the fourth of these problems i.s probably the most 

important for present purposes, the discussion which follows considers 

each problem in the order given here. 

Problems aris ing from the " Knowle d ge " category of the Taxonomy . 
Krop p et al (i960) spent some time discussing whether the "Knowledge" 
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category in Bloom's Taxonomy is a legitimate category and, if so, what 
psychological processes other than recall this category might represent. 
They further compound the problem by basing their questions en reading 
selections supplied in the test. Thus, the legitimate question is 
raised as to the meaning of less than a perfect score on "Knowledge" 
items when all the information necessary for the answering of these 
items is contained in the reading selection used,, 

These researchers do not comment on the possibility that 
"Knowledge" items presented in an "open book" format may not be 
"Knowledge" questions in the sense of Bloom's Taxonomy at all. Instead, 
these questions, in order not to be obvious, produce a test of search 
skills more commonly known in the literature on reading skills as 
"reading for details" ft f Gray, I960, p, iff. It is not surprising, 
therefore, that an important contributor to the "Knowledge" category in 
two of the grade levels is an unidentified factor consisting in grade 
nine of "Word Arrangements, better S e ts. and Symbol Production /p. 1 3l/> 
and in grade twelve of "Thing Categories, Locations, and Gestalt 
Transformations" /p. 13^/ . All three of these tests were positively 
loaded on the unidentified factor for grade nine, and the "Locations" 
test is positively loaded on the unidentified factor for grade twelve. 
These unidentified factors add credence to the suggestion that the 
"Knowledge" category for the Kropp et al (1966) tests may well be more 
related to search-skills than to recall. There is probably no logical 
method of testing "Knowledge" as defined by Bloom's Taxonomy when the 



These names refer to names of specific tests from the Kit of 
Ref eren ce Tosts (French, Ekstrom, and Price, 1963) which 
purport to define particular cognitive aptitudes. 
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information background is supplied b.y the tast. In the case of the 
experimental test in this study , no "Knowledge" category items were 
generated . 

Generating multiple choice synthea i s and evaluation items . 
Another point made by Kropp et al (1966) was the difficulty of 
generating multiple choice items of the Synthesis and Evaluation 
Categories. One of the problems encountered in this respect is the 
restriction of a specific category in Bloom's Taxonomy for inductive 
reasoning to one subcategory of the Synthesis Category. Another sub- 
category adds "unique communication" requirements which are impossible 
to meet in a multiple choice format. The third subcategory involves 
producing a plan or proposed set of operations. Again, the open- 
endedness of this requirement restricts its employment in the multiple 
choice format. 

Second, if the "Synthesis" category is restricted to induction, 
the problem remains that the internal structure of a single reading 
selection is usually highly organized. For this reason, the generation 
of a large number of items which require an inductive combination from 
some components of this selection or an inductive generalization from 
these components is very difficult because both of these possibilities 
are either explicit or closely implicit in the passage. However, if 
more than one reading selection is included in a test of this type, it 
would seem to be a relatively simple matter to generate items which 
require inductive combinations between selections or inductive 
generalisations between selections. This latter procedure is used in 
the experimental test in this study. 
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It is possible that the nature of the strategj.es employed by 
examinees when solving problems has an effect on the effective classifi- 
cation of the item by the Tax onomy. Two outcomes would "be expected in 
this case. First, the more familiar an examinee is with the content of 
the problem the lower the effective classification of that problem.. Sec- 
ond, the nature of the strategy shifts employed for generating foils may 
influence the strategies which the examinee has to employ to answer the 
problem, which in turn may also affect the classification of the problem,. 
For instance, an item classified as synthetic on the basis of the stem 
alone or the stem and right-answer may become a comprehension item if the 
foils stress reading comprehension „ Perhaps the rather surprising- 
apparent dislocations of the Evaluation items in the Kropp et al (1966) 
study reflect this problem. It should be noted that the Evaluation 
category occurs in the second, third, and sixth positions in the ordered 
Simplex (Guttman, 195*0 analysis and additionally in the fifth position 
by mean score /Cf Kropp et al , pp. 83, 87, 8"7o Greater elaboration of 
this latter problem occurs when the implicit assumptions of the Kropp 
et al (1966) study are being discussed,, Only three Evaluation items are 
used in the experimental test because of its length (30 items). 

Pro blems of item analysis on taxonomic tests. Another problem 
which Kropp et al (i960) discuss at some length is the problem of item 
analysis for tests designed to measure levels in a Taxonomy ,, The 
Taxonomy was developed on the basis of the assumption that each higher 
level subsumes all lower levels and adds some unique characteristics of 
its own. Thus, as the level of the Taxonomy increases so does the 
complexity of the problems which are appropriate to this level . It 



would therefore be expected that the difficulty of ft ems designed for 
each category would increase as the level of the Taxonomy for which 
these items were designed increased. Thus, the selection of items on 
the basis of approximate middle difficulty at each level of the Tax on omy 
in order to maximize discrimination would seem to be inappropriate. 
This sub sumption property also implies that if any item were missed at 
any level of the Taxonomy , all items desired for higher levels of the 
T axonomy which involve the context of the item missed should also be 
missed. As a result, the number of items correct at any level of the 
Taxonomy should determine the upper limit of the possible score for the 
next higher level. 

Kropp e_t al did not test this latter hypothesis in their study 
by examining individual performance to see whether or not individuals 
who answered a particular Knowledge question incorrectly tend, in 
general, to miss ail higher level questions related to the same informa- 
tion background. They did, however, mention that lew scorers on the 
Knowledge subtest tended to be low scorers on all subtests. An alter- 
native hypothesis which might be posed is whether or not those people 
who misinterpreted a particular knowledge question are more likely to 
miss a high level item from the same background if one of the foils 
contained the same misinterpretation than if it did not. Although this 
latter alternative presents an hypothesis which is beyond the scope of 
this present study, it is more in keeping with the possibility of the 
influence of systematic choice behavior on response selection as 
developed here, than is the former hypothesis. 

It is true that for diehotomous variables, the discrimination 
is maximized for items of middle difficulty. In general, if all 
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alternatives are to be considered, discrimination is maximized if the 
selection frequency for all response alternatives on .any item is equal* 
Thus, for a four-alternative item, discrimination is maximised when the 
difficulty is „25 when ail four categories are used, Jn the case of 
forcing a dichotomy on a polychotomous variable, the fact that 50 per 
cent of the examinees get the item right means that the distribution of 
answers on this item is not the product of chance, at least for the 
right answers o The same conclusions may he true for wrong answers, as 
Powell (1968) has shown, when higher mental processes are involve do 

For these reasons, it may be reasonable to ignore item difficulty 
except for very easy or very difficult items as a criterion of item 
discrimination. At least the former argument with respect to ascending 
complexity, and the related ethical problem of predetermination of 
hypothetical results were the basis for Kropp et al (i960) ignoring 
item difficulty in the preparation of their tests 0 The latter argument 
with respect to the discriminative power of polychotomous items, except 
in extreme cases, was the basis for minimizing the importance attributed 
to item difficulty in the present study. 

As Kropp et al (1966) point out £p. 77/, an additional problem 
with respect to item analysis arises in the interpretation of correla- 
tions on data derived from taxonomic tests 0 Since the subtests are 
assumed to be hierarchically interdependent, the bivariate distributions 
of scores between subtests appear triangular, making the distribution of 
each higher level skewed further to the right. On this basis the 
total -correct score may not be normally distributed for tests of 
practical length used on groups of usual size, hence the use of bi serial 
correlation for trie validation of an item against the total test score 
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may be inappropriate . This fact, as they point oat, also raises 
problems in the interpretation of any correlation coefficient in their 
study. When determining the discrimination coefficient Kropp et al 
(I966) used the traditional procedure. 

Proble ms which arj sb from :i ;iipl:i c j t as sump Lions in the Kropp , 
Stoker , and Bashaw study c The central assumption of the Taxonom y is 
that each higher level subsumes all lower levels and adds characteris- 
tics of its own. For this reason, Kropp et al (1966) approached their 
analysis with the implicit assumption that the complexity dimension was 
characteristic of the Taxonomy as a whole rather than being a 
characteristic of each level of subcategory within the Taxonomy . The 
results of their findings with respect to this assumption were incon- 
clusive o Analysis of the subtest scores showed that the order of the 
levels of the Taxonomy as a hierarchy did not fall consistently into 
the order hypothesized. On the other hand. Powell and Isbister (1969) 
tested the assumption that hierarchical categories should be obliquely 
related. Their finding, however, was that the use of a promax rotation 
did not improve the resolution of the factors when right and wrong 
answers were combined, thus the expected obliqueness did not occur* 

It has already be indicated on page 21 that the Evaluation 
category can occupy most positions above the Knowledge level. The 
Kropp et al (1966) study also found that on the basis of cognitive 
attributes no single category is consistently defined for all grade 
levels tested although the tasks themselves were identical for all 
grade levels. These two findings of Kropp et al are inconsistent with 
the Taxonomy as defined. Perhaps the Tax on omy is actually a descrip- 
tion of some of the strategies employed by humans in problem-solving 
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situations. There may be a hierarchical order to these strategies but 
they may not be taxonomic in B.!oom's sense of the term. 

A problem which is a Synthesis level problem for a five-year 
old, may be a comprehension level problem for a twelve-year old. In 
this context two deviations from this taxonomy would be expected with 
Bloom's Taxonomy. First, each category of the Taxonomy, with the 
possible exception of "Knowledge," should be characterized by a range of 
complexity levels within the category in addition to an order of 
complexity levels between categories. In such circumstances, as Kropp 
et al (1966) demonstrate, a wide range of possible orders may occur 
among specific samples from category levels. In addition to the most 
common, and expected, order of the categories found in their study, the 
categories occur at least once in any one of three other orders under 
Simplex analysis. 

Second, the strategies involved at different developmental 
stages will vary in accordance with the information and strategy back- 
grounds of the individuals at these stages. For this reason, striking 
dissimilarities in the cognitive attribute content on the basis of the 
Kit of Reference Te s t s (French, Ikstrom, and Price, 1963) for any 
category would be expected at different developmental levels. This is 
precisely what Kropp et al found. An important characteristic of this 
change should be its movement toward simplicity. For instance, if we 
reclassify the Kropp et al ( 1 96 6 ) "Knowledge" category as a "search" 
category on the basis of the Undefined factor, we find that for grade 
nine the positively correlated cognitive aptitudes are Word Arrangement, 
Letter Sets, Symbol Production which suggests that the grade nines may 
be generating their search strategies as they proceed with the test. 



For the grade twelves the positive attribute is Location;-, which suggests 
a more simple and direct approach. 

Another factor which Kropp el al (i960) discussed is that the 
difficulty of a problem may be affected by the complexity of the problem 
It also may be affected by the familiarity or obscurity of the informa- 
tion background and/or strategies required by the problem solver. It 
may also be affected by the nature and the fineness of the discrimina- 
tions which the solution to the problem requires. This latter aspect 
may be related to the nature of the foils. Kropp et al (1966) deal only 
briefly with the difficulty problem. / C'f pp. 90 and 15.2/ « 

Contributions of the Kropp , Stoker , and B ashaw Study to the 
present study . It may be possible to assume that Bloom's Taxonomy is 
not a subsumptive taxonomy, in this case, the Kropp et al (1966) study 
more strongly supports the possibility of the transcendence of process 
over content than their interpretation of their findings suggests. This 
transcendence of process over content has also been supported by Furth 
(1966) in his work with the congenitally deaf. 

In combination with the other research already discussed 
(see p. 16) there would seem to be at least three variables which 
contribute to the choice behavior of examinees on multiple choice 
achievement tests. These are: l) content, 2) process, and 3) effective 
complexity. A fourth possible variable is i tern d i 1 f 1 culty (see; p. 26) 
Since misinterpretation of content and inappropriate selection of 
strategies might both be expected to lead to the selection of an inappro 
priate response, it is reasonable to assume that at least some students 
will display systematic wrong-answer selection. Hence, in a forced- 
choice situation the nature of the alternative cnoice provided would 



be expected to influence the nature of the selections made. If foils 
are deliberately designed to reflect probable misinterpretations of 
content, or probable inappropriate selections of strategy, more than 
the "right" answers might be used to determine the present achievement 
status of the examinee. 

How can tests which meet these criteria be developed? It is 
fairly clear from the Kropp et al (1966) study that the use of Bloom's 
Taxonomy is useful as a set of guidelines for the construction of the 
relationship between the stem and the right answer for each item. A 
discussion of common recommendations for the development of the foils 
for each item is presented in the following section. 

Recommendations for Construction of Foils 

The following discussion reviews wiiat some textbook authors 
have had to say to teachers about the construction of foils for multiple 
choice items. Among these authors, Ross and Stanley (195^) list 
fourteen rules for the construction of multiple choice items. Of these 
only two deal specifically with foil (distractor) construction. 

6. Make all responses plausible 

9. To measure higher levels of understanding, increase the 
homogeneity of the options provided /p. 185/. 

These authors do not define plausibility , and the example they 
use for increasing the homogeneity of options actually illustrates 
increasing the content specificity of the item. Their second suggestion 
involves increasing the fineness of discrimination between alternatives 
which may be more related to the difficulty of the item than to "higher 
mental processes . " 
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As another example, Thorr.dike and Ragen (l?6l) in their second 

edition list ten "maxims for multiple choice items." Four of these 

have direct bearing on foil construction. Quoting the original we 

find (italics in original): 

L \ . Be sure that The re is One and Only One Correct or Clearly 

Bes t Answer . 
5 . Beware of Clang Associati ons . 

So Beware of the Use of One Pa i r of Opposi tes as Options If 
One of the Pa i r is the Correct or Be s t Answer . 

9. Beware of the Use of " None of These , " " None of the Above , " 
and " All of the Above" as Options . /pp. 74, 75, 76, and p. 77/ 

Whether or not there should be more than one "correct" answer 

will depend upon whether or not the examiner wishes to discriminate 
between levels of insight into a particular problem as in the "best 
answer" type of test. However, to make such discriminations ma.y require 
the use of information from more than one alternative of any item. 

One of Hoffmann's ( 1 9 6 2 ) most damning criticisms of the multiple 
choice types of tests arises from the arbitrary assignment of only one 
alternative of the response set to the "right" category in tests of this 
type in such a way aa to discriminate against the thoughtful, well 
informed student. This admonition is only appropriate if we are to 
assume that the only answe r to be taken into account for any particular 
item is the one designated as "right" whereas the "rightness" may be 
arranged on a continuum in the "best answer" type of test. 

Thorndike and Ha gen (1961) quite rightly point out that Clang 
Associations (see number 5> P« 28) between stern and right answer tend 
to give the answer away. However, using superficial associations be- 
tween the stem and trie wrong answers may in some circumstances be an 
effective discriminating device (see? "Word-Word Link," p. 36). 
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Thomdike and Hagen ! s (1 961) alternatives, numbered eight and 
nine (see p. 28) are interesting in that they suggest that certain 
aspects of the lex ical relati on ship a between answers and foils should 
be considered in foil construction. If a student selects an answer 
belonging to a set described in number five (see p. 28) that is 
logically opposite to the "right answer," then this selection in itself 
may contain useful information. Such a selection reveals at the very 
least which students completely misunderstand the relationship in 
question. Why this sort of alternative should be discarded without 
qualifications is therefore not clear. The criticism these authors 
make of the "all of these," "none of these" type of alternatives have a 
similar basis. They neglect to say that if "none of these" is correct 
it may be regarded as being logically equivalent to an omission. The 
"none of these" provides a noncommittal response which has the effect 
of making closed-choice alternatives into open-ended alternatives. For 
some purposes it may be useful to know if the student made one of the 
less common errors, if there are more possible errors than the foils 
account for,. In addition, omissions at the end of the paper can also 
mean "not finished." Since there is more than one possible reason for 
omitting an item, interpretations of an omitted response becomes ambig- 
uous. For these reasons, the basis upon which a student makes a non- 
committal response may be a valid question for study. 

More recently, Ebel (1965) lists if8 "suggestions for preparing 
good multiple choice test items." Of these kQ only five directly 
relate to foils or "distractors . " He also rates these as "desirable" 
or "undesirable." Quoting the original: 

32. Item using true statem ents as distractors ° (Desirable) 
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33. Item us i n." ; stereotype;-, in d i s tva -tors ,, (Desirable) 

3k. I te n; us i obscure d j .;tractors . (Undesirable) 

35. Item us ing a hi frhl.y implausible Oistractor . (Undesirable) 

36c Iter:; .involving verbal 1 rj ck. /pp. 183-1.8^7 (Undesirable) 

The first two of these are examples of the use of errors in 
logic which Sanders /l966, p. lOkJ suggests we teach the students to 
recognize, but does not elaborate on, with respect to measurement. 

Ebel : s (1965) suggestion numbered immediately above, proposes 
that the use of obscure or complex vocabulary is undesirable. On the 
contrary, if the intention of the examiner is to study responses to 
obscure, ambiguous or complex situations, this type of item may be 
desirable o Although other methods may have certain advantages when 
measuring complex human behavior, the multiple choice method retains two 
particular advantages. First, a high level of control can be maintained 
in the alternatives supplied so that the "controlled sample of perform- 
ance" characteristic of all tests can be very explicit. Second, once 
the performance components of the complex behavior which is to be 
observed has been established, accurate counts of the frequency of the 
choices which fit the categories of alternative (whether right or wrong) 
designed to measure these components is a simple matter. Other meas- 
uring instruments have other advantages at the expense of these two. 
The study of such items would probably necessitate examining all 
responses to each item. Thus, Ebel's (1965) proposal that this type of 
item is undesirable can be considered valid only if the "one right 
answer" assumption is considered valid. Closer scrutiny of this entire 
problem seems reasonable. 

In suggestion numbered 35? (p. 30 ), relating to implausibilit.y 
the problem of a definition for plausibility arises once again,, 
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Plausibility may be a function of the rationale used in determining the 
construct and content validity of the test. The examiner must be able 
to anticipate what alternatives may be plausible to the examinees. 
Without a definition of plausibility, implausibility is impossible to 
determine. In fact, plausibility is often defined on a post ho c basis 
from the item analysis with foils having a low selection ratio being 
classified as ''implausible." However, if the purpose of discrimination 
is to identify individuals for differential treatment a foil which 
identifies ten or twelve cut of 1,000 students may be more valuable than 
one which identifies 250 students. 

Finally, many foils which seem to involve a "verbal trick" may 
have a valid function. These verbal tricks are probably of three kinds. 
The first kind could be the introduction of a peculiarity cf wording 
designed to produce interpretive or misreading errors on the part of 
some students. The second kind of "verbal trick" is found in such 
things as Zeno's Paradoxes (_c 3k0-26k B. C.) in which the "verbal 
tricks" involve a faulty assumption in the reasoning. The third kind of 
"verbal trick" introduces the possibility of detecting in the examinee 
an inappropriate "set" for the correct solution of the problem. Both 
the Einstellung effect and "functional fixity" may possibly be used to 
develop examples of foils for this type. In each of these cases it is 
conceivable that the information generated from response to these types 
of item could have discriminative value . The issue here, once again, 
is both content and construct validity. Does the "verbal trick" give 
the intended information, or interfere with the obtaining of this 
information. 

We find this same ambiguity of advice prevailing throughout the 
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range of standard texts in this area. Prom the ETS booklet Multiple 
Cho ice Question;.. : A <.•!(• i;e look (19(0) through to such writers as 
Ahmann and Clock (1963), Gronlund (l'-)C5) and Noll (1965 ) we find the 
great bulk of the suggestions about item writing discussing the func- 
tional, linguistic, and structural characteristics of the stern, and 
stem-right answer relationships, with only minimal and often contra- 
dictory treatment of the foils and how to construct them. 

Need for a Basis for Interpre t ing For 1 Selec tion 

On the basis of the above discussion we can identify several 
general bases for foil construction as presented to constructors of 
multiple choice tests. These are: 

1. Logical Relationships 

2. Logical Errors 

3. Partial Information 
k. Misinformation 

5. Obscure Relationships 

6. Misunderstanding 
7o Verbal Tricks 

Not all of these are regarded favourably by the authors 
mentioned nor are these bases consistent with themselves or between one 
author and another. It would seem that the recommendations have been 
developed on a trial-and-error basis derived from the experience of 
professional test constructors during their attempts to meet the 
statistical criteria of an "effective item." 

The Possible Value of the Kxperi mental T o s t 

The testing technique being explored in this study hypothesizes 
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that BJ corn's Taxonomy adequately describes the strategies leading to 
right answers, and that a set of logically based guidelines for foil 
development effectively describes some of the possible systematic 
deviations from the ideal outcomes of these strategies 0 These two 
facets combine to form the construct characteristics of the testing 
techniques under study „ Of course, any findings from a purely explor- 
atory study must be tentative . However, wrong answers from a "strategy" 
test may increase the predictive power of that test for total achieve- 
ment scores (found in the usual way) from independent achievement tests. 
In this case, more than the information background of a test may be 
involved i.n "success" on an achievement test,. Such findings would 
strengthen the support for the hypothesis that process may transcend 
content. Furthermore , this study may suggest some of the typical types 
of errors students may make as they mature intellectually which might 
eventually lead to the establishment of a behavioral description of 
development which is independent of test content, and of educational 
strategies which may be appropriate to the stages and phases of this 
developmental sequence. 

On the application side, the main advantages of guidelines for 
foil construction would be expected to involve the l) simplification 
of item writing, 2) clarification of why a foil is wrong, and 3) 
possibility of producing diagnostic tests in subjective content areas „ 



CHAPTER LII 



.DESIGN OF THE EXPERIMENTAL TEST 

From the discussion developed in Chapter II, the usefulness of 
Bloom's Taxonomy for the development of process-oriented items was 
suggested. The possibility that Bloom's Taxonomy may not display the 
assumed sub sump live characteristic between categories does not minimize 
its role relative to the establishment of the construct validity of a 
test. The evidence presented suggested that there is no similar set of 
internally-consistent construct guidelines for the development of foils. 
Seven general categories of foil based upon recommendations from the 
literature could be esta.bli.shed. Using these seven as a starting point 
the first task in this chapter is to develop a systematic set of Guide- 
lines which may prove helpful for foil construction. The seven cat- 
egories can be further reduced, perhaps the most important category 
involving strategies are those foils which can be based on probable 
errors in logic ma.de by the examinee. Partial information can lead to 
an error in logic if the wrong strategy is used to generate the missing 
information o It can lead, also, along with other casues, to an over- 
simplification of the problem. Since only the product of the choice- 
behavior is observable on a multiple choice test, it would be reasonable 
to include Logical Errors, Partial Information, some Verbal Tricks, and 
perhaps some Logical Relationships (like answers which are opposite to 
the right ones) in a list of categories of foil generation where higher 
mental processes are to be tested. 

Misinformation and misunderstanding may be identical or they may 
be different in that the misunderstanding may be related to the reading 



of a specific item or group cf it emu rather than to a weakness in the 
information background of the examinee. If the examinee succumbs to 
certain kinds of verbal tricks (for example, the use of meaningless 
jargon in a foil) his problem may be more immediately test-related than 
background-related provided that he is not misled elsewhere cn the test 
when jargon is not used. In the present context there is the possi- 
bility that in addition to the process variables there may be a class 
of foil related to the linguistic characteristics of the item. This 
class of foil may be designated as a "Misreading" class. 

Another possibility is that the examinee has systematically 
misclassif ied a particular piece or set of information. In items based 
on the possible logical relationships among the total information back- 
ground this piece of misinformation will lead to the systematic 
selection of specific wrong answers each time this misclassif ication 
appears in a foil. For instance, the person who confuses the work of 
Kebb with the work of Hull. Foils of this type, and of several others, 
are beyond the scope of the present study and are, therefore, classified 
in the "Others" class. Subsequent research may be expected to elaborate 
this latter problem. 

The Guidelines for Foil Construction 

Our earlier discussion showed that similarities can be found 
between common errors on nonverbal tests and verbal tests (see: p. 17). 
The present experimental test in its original version was based on 
classifications involving logical fallacies and logical relations. The 
test has been revised in an attempt to improve item discrimination for 
the present study. The same reading selections and general questions 
and overall format remained unchanged. The Guidelines which follow 
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were used to rev j i\o the foils. 

Since the definitions of foil classes, as they were originally 
used, tended to lack precision, they were redeveloped for this study* 
The Guidelines are described below ( see p.- 37) in terms of the proce- 
dure used for constructing each type of foil. Four classes were 
produced : 

1. Strate gy class ; the largest group of Guidelines to be 
developed for this study is based on the logical 
characteristics of the foil relative to the right answer 
and information background. Because these types of foil 
are suggestive of incorrect analytic procedures they are 
collectively referred to as the "strategy class." 

2 C Ml nread ing class ; this group of f oils is based on semantic 
characteristics of the foils relative to the right answer 
and information background. The nature of this test, 
i.e. an open book test, would be expected to reduce the 
possible number of foil categories in the misreading class 
because an examinee who feels he has misread an item can 
refer directly back -to the information background supplied. 
This class oC foil probably has many more members which 
would describe different aspects of misreading where 
information recall is the source of information. An 
example of this situation is the Jargon (j) category 
(see p. 39). 

3. Other ; the foils in this class are unclacsif iabie , at least 
by the present Guidelines. Future studies are expected to 
reduce but not eliminate this class of foils. 
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k« Misinformation olajs; the nature of the experimental 

examination, (i.e. an open book examination) precludes the 
development of misinformation foils which would he expected 
to occur in the context of a test requiring .information 
recall o These would he expected to be related to "Knowledge" 
level items, a, level of item which was not used in this 
examination for reasons already discussed (see p. IQff). 
The first two of these major classes may be subdivided on the 

basis of a specific description of how a foil which fits any particular 

category is produced. This subdivision follows: 

Guideline a 

A. Strategy Class 

1. Over general i v.a t i on . (OG) In the development of this type of 
foil the author retains the correct relationship of the 
best answer in its entirety and adds some irrelevant 
information. (For example, see item 1A, p. 15^0" 

2. Overs i mplif 1 cat! on . (OS) In this case the author omits one 
or more parts of the best answer. (For example, see 

item 2G, p, 1 56) ° 

3. Inversi on (inv). In this case the author makes a statement 
in some way opposite to the best answer «, (For example, 

see item hC, p. 158). 
kp Irrel evancy (irr). In this case the author makes a true 
statement which is unrelated to the best answer, or a 
statement which could be a correct answer. (Perhaps by 
virtue of some restriction in the stem),, (For example, 
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see item ID, p . .' yr) . 

5. I nva lid Afu3um£tj on (lA). In this case the anther begins 
with an unwarranted assumption about the background or 
solution to the problem and thus writes a foil which would 
be correct as if this assumption were valid. (For example, 
see item 1C, p. 15^) <> 

6. Substj tuti on (Sub). In this case the author replaces at 
least one of the elements or the relationships of the best 
answer by a corresponding' element which is less acceptable, 
(For example, see item 2B, p. 156). 

7. Transpositi on (Tr). In this case the author modifies the 
order of the elements in an ordinally dependent relationship. 
(For example, see item 30C, p. I8l)„ 

6. Common Misconception ( CM) . In this case the author utilizes 
his knowledge of the probable common misconceptions held by 
the examinees to write the foil. (For example, see item 5B, 
p. l59)o 

B. Misreading Class 

1. Word-Word Link (WW). In this case the author produces a 
false statement which has strong verbal links with the stem 
or background information by either repetition or associa- 
tion. This type of foil may be similar to Foulde's (1965) 
Class I) error, see p. l6„ (For example, see item 7B, p. 162) . 

2. Rede fini ng of Terms (RT). In this case the author uses a 



This type of foil misleads certain of the best students, 
perhaps the more imaginative ones (see p. 13). 
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word or words in the foil in different literal or 
connotative sense than it J s used in the stem or background 
information. (For example,. Hee item 111), p„ 166). 
3° Jargon (j)„ In this case the author produces a quasi - 

meaningful statement which tenuously relates in some manner 
to the best answer. The use of coined "near words" may also 
be present. (Not used in experimental test; see p. 36) . 



C. Others 

1. Othe rs (O). In this case the foil Is, at present, for some 

reason, un classifiable , 
These are the Guidelines which were used in the construction of 
the foils in the experimental test,. 

Structure of the Experimental Test 

As already mentioned, Bloom's Taxonom.y was used as a guide for 
the construction of the stem and right answers of the experimental test. 
An interrater reliability between judges for the advance classification 
of right answers was reasonably high (r = 083). The Guidelines just 
given on pages 37-39 were used to construct the foils. The interrater 
reliability for foil classification was somewhat lower (r = 0 62, N = 5). 

As in the case of Bloom's Taxonomy for the right answers, the 
Guidelines presented the immediately evident advantage of increasing the 
number of possible foils which could be considered for any one Item, 
making foils easier to generate than they were in the more usual "hit- 
and-miss" method. An additional advantage for the Guidelines became 
evident after the earlier administration of the test. The Guidelines 



help clarify the basis for why any foil should be considered wrongo The 
absence of such a basis is a common weakness of teacher-made tests. 

The examination consisted of five short reading selections drawn 
from material which was in some way related to educational psychology 
since 'this was the central topic of the course in which this examination 
was to be used. They were also chosen on the basis that it was rel- 
atively unlikely for the examinees to have encountered the works from 
which these selections were drawn in their previous training. To the 
extent that these selections were specifically oriented to the vocab- 
ulary of the studies of psychology and education, this test demanded 
information recall from the examinees. Aside from this restriction, it 
was assumed that all items could be answered correctly solely upon the 
basis of the information given in these selections. This assumption 
may not have been entirely warranted „ 

Since most of the necessary background information was assumed 
to have been supplied in the test, no Knowledge category items were 
generated. On this basis, the test was intended to be a "higher mental 
processes" teste Since the major emphasis of the test involved logical 
analysis, it was assumed that the test was essentially an "Analysis" 
level teste The findings of the preliminary version as reported in 
Powell and Isbister (1969) confirm this assumption <, 

Content and Construct Characteristi cs of the Experimental Test 

A detailed item-by-item discussion of the test may be found in 
Appendix B (see: p c 153 f f ) ° In brief, five reading selections related 
to the area of educational psychology were chosen on the basis of 
information density and the unlikelihood of the examinees having 
encountered the selections previously „ These selections which are 



both given and referenced in Appendix B are referred to subsequently as: 
1 o Stupi d:L ty 
2o Awareness 
3. Aggression 
ko Discipline 
5o Progress 

The 30 items in the test were classified using Bloom's Taxonomy 
as a construct model as indicated in Table 1 (p. k2) and elaborated in 
the discussions in Appendix B» No Knowledge -level items were developed,, 
Items were classified as Synthesis if they required the examinee to 
organize the material from more than one reading selection into some 
systematic relationship when deciding which alternative to select for an 
answer. A reasonably high interrater reliability (r = 083) was found for 
the classification of the items based upon the item format, the stem, and 
the stem-right relationship when the right answer was indicated. Dis- 
agreement occurred among several raters, and among other reviewers of 
this study on the keying of some of the items „ This disagreement would 
be expected from the subjective nature of the content and the related 
differences among the value systems inherent in any group 0 

Much less agreement among raters was found for the foil 
classification (r = ,62) . This poor result was expected for the 



A word of caution is in order,, This low interrater reliability 
suggests that readers following the item-by-item discussion in 
Appendix B may disagree with the foil classification given and 
with the reasoning behind it. It would be interesting f° r the 
reader to record his disagreements and to compare these with the 
results of the cluster analysis as given in Table II, page 73? 
and Table 12, page 77. 



TABLE 1 

CLASSIFICATION OF ITEMS 
USING BLOOM 1 S TAX ONOMY 
(30 ITEMS) 



Bloom' 

Comprehend i on Appl i oat i o 


s Category 
n Analysis 


Synthesis 


Evaluation 


Item 








Numbers 3,6,12,15 5,7,11 


1,2, '1,8,9 


13,14 


10,16.18 




17,19,20,21 


26, 28 






22,23,2^,25,2? 


29,30 




Totals k 3 


lk 


6 


3 
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reasons already given (see: pp„ z !-5), and the findings of Kropp et ai 
(1966) which suggest multiple interpretations of specific items and 
alternatives within heterogeneous groups. This multiplicity would be 
expected to increase with the complexity and subjectivity of the content 
so that a high level of agreement;, even among professionals on the 
particular test used in this study, would be unlikely. 

To illustrate the extent of this problem, a check was conducted 0 
One of the raters of the items disagreed with the classification of 
three foils in particular. Of these three only one of his reclassifi- 
cations was supported by the cluster analysis as given in the results of 
the study (see: p„ 182 for details) 0 This one-in-three success ratio 
was equivalent to that of the experimenter 0 

The overall appearance of the experimental test suggests that in 
the traditional sense it is a very poor one. The internal consistency 
value for the test was K-R 20 = .jk. A review of the item difficulties 
and biserials from Table ^0 of Appendix A (p. 150) is equally discour- 
aging. However, the use of Bloom as a model for the right answers and 
the Guidelines for the wrong answers suggests that the test should not 
be considered homogeneous. For this reason, and the reasons given 
earlier when discussing this same problem relevant to the Kropp et al 
(1966) study, (see p u 21 ff) the use of traditional evaluative proce- 
dures or. this test may be questionable Support for this position is 
found in the Procrustes rotation of the factors to fit the clusters 
which gives six nearly orthogonal factors which display quite adequate 
internal consistency (see: p. 70) . 

The foil classification procedure differed from the item 
procedure in two important respects. First, although the Guidelines 



were also used a« a model for the possible infoisiuti on content of the 
foils, the relationship between this mode] raid anv possible characteris- 
tics of the examinees, in the almost total absence of research, was 
largely unknown „ Second, whether these Guidelines formed mutually ex- 
clusive categories, or a hierarchy paralleling Bloom could only he 
inferred from the assumptions which went into their development 0 

These foil Guidelines were used to help develop the foils as 
well as to classify them c A summary of the classification of the foils 
is given in Table 2. 



TABLE 2 

CLASSIFICATION OF WRONG ANSWERS 
USING THE FOIL GUIDELINES 



Classification of Foil 

A B C 

Strategy Misreading Other 



OG 


OS 


Inv 


Irr 


I A 


Sub 


Tr 


CM 


WW 


RT 


0 


1A 


2C 


40 


IB 


1C 


2B 


30C 


5B 


7B 


11D 


All 


4A 


3D 


8D 


2B 


5A 


3A 




9B 


10D 


15B 


foils 


6B 


5C 


9B 


3C 


8A 


6C 




10A 


13B 


29D 


in 


8C 


7D 


10 C 


kv 


13 A 


7A 




12B 


15C 


30D 


items 


12C 


11C 


11A 


61) 


25B 


90 




130 


3 OA 




19-24 


1A-D 


26C 


12D 


IkA 




140 




16D 






26B 


15A 


29C 


17C 


10 A 


27 A 


17D 




27D 








16C 






18C 


28D 


18A 




29A 








17B 






25C 




25A 













18D 2oL 
280 270 
28B 



11* 7 7 12 7 9 1 8 

* = Frequency of foil in each category,, 
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The test used in this study is a revision of the one reported 
in Powell and Isbister (1969) which had a slightly different purpose. 
The present discussion supported by the item-by-item analysis given in 
Appendix B would seem to demonstrate that, for all the faults of the 
instrument, the content and constrict requirements for this test as 
laid out in Chapter II have been met to a reasonable degree of 
acceptability . 

In the Powell and. Isbister (1969) study the advance 
classification was taken as given and profile sec res were developed 
accordingly „ The resulting score sets were treated as independent 
variables and subjected to principal axis factor analysis in order to 
determine relationships among these scores 0 In this study the advance 
classification was not taken as given but subjected to a comparison 
with a cluster analysis based upon the relationships found among each 
of the alternatives 0 In this present study the acceptability of the 
advance classification system as exemplified in the test was being 
studied. 

On the basis of what has already been said, about the problems 
that communications of this type produce, it would be reasonable to 
expect the advance classification systems used in this study would 
not hold up without the qualifications derived from the possibility of 
l) multiple interpretations of the communicating stimulus, 2) multiple 
methods of integrating and relating the stimulus to each individual's 
own experience, and 3) leading to multiple interpretations of the 
responses o 



CHAPTER TV 
THE BBSIG1! OF THE STUDY 

The success of tMs study is contingent upon three aspects. 
First, the study must stand upon the acceptance of the logic of the 
content and construct validity of the experimental test as given in 
Chapter III, 

Second, the construct validity must find evidential support in 
the statistical results of the analysis of the examinee performance on 
an administration of the experimental test. This support can he found 
in several ways , First, the advance classification may he found to re- 
appear in the statistical patterns. Second, the content pattern might 
be shown not to he an important contributor to the statistical patterns. 
Third, in the event that the advance classification cannot be supported, 
some reasonable method of modifying the advance classification which does 
net violate the construct assumptions, such as possible multiple 
interpretation of the items, may be found,. Fourth, the patterns should 
cross-validate between equivalent independent groups. Fifth, if cross- 
validation fails, a reasonable explanation which fits the data and the 
construct assumptions must be found to explain this failure. 

Third, however much the construct validity is supported, wrong 
answers in some form must also contribute significantly to the prediction 
of achievement scores obtained in the usual manner before they can be 
considered to contain achievement information. 

These three aspects form, in combination, the necessary and 
sufficient conditions needed to demonstrate that the method of test 
construction used in this study can be used to develop tests which 
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contain useful information about student perfo raan< e in the answers 
given to the foils.. A further restriction to this problem arose . Since 
the study began with categorical data, it should end with categorical 
interpretations in so far as is possible. 

To begin with, however, the answer selections on the 
experimental test cannot be assumed to have any of the usual continuous 
distributions. The selection pattern can be considered to be cat- 
egorical } since one choice is made for each item, bub not dichotomous. 

An expedient method of defining categorical data mathematically 
is to treat categorical membership as "one" (l) and nonmembership as 
"zero"' (0). A matrix of categorical data should have the following 
properties : 

1. The centroids of normalized clusters from the matrix should 
tend tc be either orthogonal or opposite each other. 

2. The orthogonal projections of the members of a cluster upon 
its centroid should be near unity,, 

3. The orthogonal projections of the members of a cluster upon 
the centroids of all other clusters should either be non- 
existent, or nonsignificant. 

Figure 1 illustrates a typical response matrix which displays 
these properties for the twelve variables included, and may display 
these properties for some reduction of the matrix to less than twelve 
variables. Figure 1, a sample response matrix is on page 48. 

The usual procedure for test analysis is to use the right 
answer portion of Part B and Part C (the total number correct) and 
to treat the wrong answer division of Part B as redundant. 

If - all four alternatives of each item are considered the 
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The solution to this problem used in this study was to partition 
the matrix as indicated in Part 3 of Figure 1 (see: p, ^8). Two 
matrices, one for the right answers and one for the wrong answers, were 
made from the original response matrix. The categorical property was 
retained within each of those two new matrices . This procedure had the 
effect of treating right and wrong answers as though they were 
independent . 

Part C of Figure l-(see p. 48) shows the row sum (horizontally) 
of the right answer partition of Part B. This sum, which is the total- 
correct score, is the usual approach to the interpretation of test 
results. It is with Part C that the results of the statistical analyses 
of Part B are being compared. 

There are several possible methods of dealing with categorical 
data,, Since this study is concerned with relations among categories 
the most reasonable approach is to begin with phi correlation co- 
efficients between the category pairs. This procedure produced two 
correlation matrices, one 30 by 30 for the right answers, and one 90 
by 90 for the wrong answers. 

Since the results of these analyses were to be cross-validated, 
the original group of examinees were subdivided by rand on assignment 
into two groups (Group A and Group B) 0 The data for both groups were 
subjected to the same statistical treatment although most of the 
interpretive work was done with the results from Group A. 

The result of this latter subdivision was that the analytical 
aspects of this study began with four phi coefficient matrices (one for 
right answers and one for wrong answers for each of Groups A and B). 
These four matrices were the basic data for much of this study e They 



may be found in Tables 32 to 39 of Appendix A. 

The phi matrices gave relationships among pairs of alternatives 
only. To proceed further, it became necessary to find relationships 
among these relationships,, From the original structure of the experi- 
mental test there were two patterns of relationship which could be 
sought. The first was the pattern as defined by the advance classifi- 
cation based on Bloom's 'Taxonomy, the second was the pattern as defined 
by the content (information background) of the items and foils. 

One of the methods of checking the data for these patterns 
which could be used is the Procrustes rotation solution to factor 
analysis „ The procedure began with the principal axis factor solution 
and found the best rotation of this solution in a least squares sense 
for a given matrix. 

A factor solution was used to remove as much measurement error 
as possible from the further analytic procedures used in this study,, 
The phi coefficient is extremely sensitive to the marginal proportions, 
particularly when the selection ratios deviate considerably from .50, as 
in this study where four alternatives are being used. Slight changes 
can have a. profound effect upon specific coef f icients „ This effect can 
be reduced by the factoring procedure which takes trie relations among 
coefficients into account „ 

The principal axis solution was used to get as much variance as 
possible in as few factors as was reasonable 0 

A third approach normalized the principal axis matrix by rows 
and then found the distances between trie ends of the resultant vectors 
by the usual distance formula. Clusters were then defined in terms of 
minimizing within cluster' distances and maximizing between cluster 
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distances,, The mathematical procedure used in this study is given in 
Appendix A (see: Table kl , p. 15] ) . 

The advantage of this procedure is that if a good fit is obtained 
the solution alleviates many of the problems of rotation which are other- 
wise inherent in factor analytic solutions „ 

All three of these solutions can lead to results which can be 
interpreted categorically. If a good fit is found with the target matrix 
for advance classification either by process or by content, then the cat- 
egories of the original classification were to be used. If, on the other 
hand, good fits were not found, then the categorical solution of the 
cluster analysis procedure would be studied for possible interpretation 
on the basis of either process or content 0 In this latter case, the data 
would also have to show that there were no contradictions to the content 
or constract assumptions as given in Chapter II, (p u 12 ff) otherwise 
this study would not meet the necessary and sufficient conditions 
required as outlined at the beginning of the present chapter on page k6 0 

An additional advantage of these three procedures is that they 
all begin from a principal axis factor solution of a correlation matrix* 
Furthermore, if the same factor solution is used in all three cases, the 
goodness of fit of the cluster solution can also be found by the 
Procrustes methodo Hence all three of these solutions can be subjected 
to the same criteria,. 

Since the object of this study was to support the constract 
cha.racteristics of the experimental test, the analysis began with an 
attempt to find the best possible cluster solution to this construct 
criteria for the data, of Group A„ It was decid_ed that the best possible 
solution would involve having clusters defined by the most frequently 
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recurring category as defined by the advance classification. The number 
of these identifying elements was to be as large as possible fcr each 
solution, i.e. the right answers and the wrong answers. The number of 
factors in the principal axis solution needed for this result was then 
taken as standard for all solutions involving the same kind of data» 
For instance, si.x factors gave the best solution for the right answers 
for Group A. Hence, six factors were used for all right answer analyses „ 

For cross-validation the identical statistical procedure used 
with Group A was repeated with Group B. Cross-validation was then 
established once again from a best-fit match (in terms of most fre- 
quently recurring members) between the categorical results for Group A 
and Group Bo Several procedures were used until a satisfactory match 
was found. Once again, the cross-validation could not violate the 
construct considerations outlined in Chapter II, page 12 ff for this 
study to be successful. 

Finally, the categories which were established as being 
potentially meaningful in the earlier parts of the study were used as a 
basis for rescoring the experimental test. The results of these sub- 
test scores were combined in several ways and their ability to predict 
the total-correct scores of two independent achievement tests for the 
same subjects was compared with the predictive power of the total- 
correct score. In this latter case it would be necessary to show that 
the use of wrong answers consistently improved prediction over 
total-correct score and combinations of right answers. 

If all these criteria were met., the value of wrong-answers 
part of performance information would be demonstrated. With so many 
criteria to meet, the probability that such conclusive evidence would 
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be found is exceedingly low» On the other hand, trends in the 
directions indicated, could be treated as suggestive « The borderlines 
between undemonstrated and suggestive, and suggestive and conclusive, 
are unclear and subject to disagreement. 

Statement of the Problem 

Since this study is exploratory, attempting to demonstrate the 
presence of information in wrong answers and to discover the major 
properties of this information, an elaborate theoretical structure for 
formulating testable hypotheses was considered to be unnecessary 0 
Instead, the procedures suggested for the establishment of grounded 
(data-based) theory as outlined by Glaser and Strauss (1967) was used. 

Such theory as is used in this study comes from well established, 
principles in psychology, communication theory, and test construction 
theory. Beginning with the S-O-R paradigm commonly used in problem- 
solving studies, it became evident -from communication theory that each 
of the members of this paradigm may best be considered a composite. 
That is, any specific stimulus may be subject to a range of interpreta- 
tions. If this stimulus requires the solution of a problem, the 
specific interpretations may be subject to a range of solution strat- 
egies some of which may lead to "correct" and some to "incorrect" 
solutions. In the multiple choice test, the examinee can be expected 
to try to match the alternatives given him in the item within the 
interpretation range and strategy range available to him. In this case, 
the most reasonable assumption would, be that most, if not all, responses 
given by an examinee to a multiple choice achievement test would be 
selected on a systematic "basis 0 

If some characteristic of particular alternatives in two 
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separate items are sufficiently similar to the apparent right solution 
in the view of the examinee, he can he expected to choose "both of them, 
If a sufficiently large number of examinees select this same pair of 
alternatives this joint selection will appear as a high correlation in 
the phi coefficients relating; the two events, thus becoming "systematic" 
in that it would produce a significant statistical event, 

In the usual procedure used for scoring multiple choice 
achievement tests, only the right answers are treated as systematic in 
this sense, hence the requirements usually set for their perf oiroance 0 
This study addressed itself io the exploration of the possibility that 
"most if not all of the answers given to multiple choice achievement 
tests are selected upon a systematic basis." This psychological 
hypothesis is the basic theoretical proposition proposed by this study. 

Since it is possible that wrong answers may influence the way 
in which items behave, and, inferring from communication theory, the 
suggestion emerges that each alternative may have more than one 
interpretation among a, group of examinees. Thus, there are four 
possibilities, l) that the systematic characteristics depend upon 
content; 2) that the systematic characteristics depend upon the 
advance classification as defined by the two process models of Bloom's 
Taxonomy and the Guidelines; and 3) end '+) that the systematic 
characteristics depend upon multiple interpretations as based upon 
content or process. The study could have no commitment toward any of 
these four possibilities. 

For an exploratory study, Q.EoD. can be written at this point 
without further interpretation attempts. The developmental characteris- 
tics of wrong answers, their relationships with personality variables, 



with ri/jht answers, etc. exceeds the scope; of this dissertation,. These 
topics are', of course, legitimate areas for future research* 

The Sample Used 

The experimental test was administered to 277 summer school 
students in a one semester course in educational psychology at the 
senior level. The age group range of these students was from 19 to 55 
with the median age about 30, and most of the students having had some 
teaching experience „ The overall group was subdivided by random 
assignment into two groups (Group A of 139 students; and Group B of 138 
students),, A t-test for independent samples based on the total -correct 
scores of the experimental test designed to confirm the equivalence of 
the scores of these two groups is reported on page 92 o 



CHAPTER V 

RESULTS AND THEIR INTERPRETATION 

A somewhat different procedure to the one usually employed was 
adopted for this study. To begin with, the usual procedure for scoring 
and interpreting multiple choice achievement tests is to count the 
number of items each examinee has correct. This procedure is sometimes- 
modified by the specification, by various methods, of subtests of the 
total test. One of two general procedures is usually employed. Either 
the experimenter establishes the categories into which the items fall in 
advance of the test administration and then interprets his results on 
this basis, or he groups his results on the basis of some analytical 
procedure and then endeavours to interpret these groupings „ Powell and 
Isbister (1969) used the former procedure, and Powell (1968) used the 
latter. In general, only right-answer information is used. 

The present study endeavours to link advance classification and 
statistical classification, and also endeavours to use wrong answers as 
well as right answers in the interpretation of test results. As has 
already been indicated, very little research of the type just described 
is present in the literature* For this reason, this study can best be 
described as exploratory in which negative results are more likely to be 
indicated than are positive results „ 

Each item on the experimental test had four alternatives, hence 
the study began with four valuables for each item 0 The response matrix, 
therefore, contained a "one" (l ) for each alternative selected by each 
examinee; otherwise "zero" (0) „ Since the examinee was allowed no more 
than one choice per item for 30 items, each examinee would have a 
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maximum of 30 "ones" in the vector of 120 variables which represented 
his selections „ Because these selections were further restricted to one 
in each group of four, each variable was linearly dependent upon the 
other three in the same item. In order to remove these linear depend- 
encies, the performance matrix was partitioned into a right-answer 
matrix and a wrong-answer matrix. These latter two matrices were 
subsequently treated as being independent. 

In order to attempt to cross-validate the findings, the 
examinees were randomly assigned to two groups, Group A and Group B. 
All the statistical analysis done which was not related to cross- 
validation was performed on the data from Group A. The relationship 
between the mean total -correct scores of Group A said Group B is given 
in Table 

In addition, since a relationship between advance classification 
and statistical ordering was being attempted, an advance classification 
system was used separately for the items as represented by their correct 
alternatives and their foils 0 These classification systems were 
discussed in detail in Chapters II and III„ 

Since an attempt to find a consistent interpretation of 
performance is being made, the examinees were randomly assigned to two 
groups so that the interpretations could be examined for cross- 
validation. Hence, the basic data for this study consists of two phi 
correlation matrices (see: Appendix A, Tables 32 to 39) for each of 
the two groups. The correlation matrices represent the intercorrelation 
between variables across examinees for the right answers and for the 
wrong answers in each group. 3 

Finally, two achievement test scores were obtained for each 



examinee. One of them was concurrent in the sense that the 
experimental teat formed a subtest in the mid term examination given in 
a one-semester course 4 The other achievement score was part of the 
final examination in the same course. This data wore collected so that 
the predictive validity of the various interpreted categories and their 
predictive cross-validation could he determined. 

Several steps were taken in each phase of the analysis. For 
instance, attempts were made to interpret the right answers on the hasi 
of both factor analysis and inter-point distance cluster analysis. Thi 
step was followed by a detailed logico-semantic analysis of the right 
answer clusters in an attempt to interpret these clusters » 

A similar logico-semantic analysis was made of the wrong-answer 
clusters . 

Attempts were then made to cross-validate the 'advance 
classification, the interpreted clusters and a particular grouping of 
the interpreted clusters. 

Finally, the predictive validity of the advance classification, 
the interpreted clusters, and the grouped clusters was founds This 
validity was found in each case by using the right answers alone, and 
the combination of both right and wrong answers „ 

The discussions which follow adhere to this sequence. 

Interpretation of Right Answers Using Factor Analysis 

On the basi.s of the advance classification there were two 
possible interpretations based upon either of two independent classifi- 
cation systems with respect to the right answers given by the examinees 
One of these interpretations could have been best described as a 
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"process" interpretation based upon classification of the items on the 
"basis of Bloom's Taxonomy .. Yae other possible interpretation was 
"content" in which the items were classified on the basis of the 
information background required to answer them. 

An attempt was made to verify the possible existence of either 
or both of these two interpretations. The primary data for this attempt 
was a six-factor unrotated principal axis factor matrix derived from the 
phi correlations for the right answers. This matrix was rotated by a 
Procrustes solution to find the best fit (in a least squares sense) to 
two target matrices. The first of these targets specified a simple 
structure which indicated the way in which the items were classified 
using Bloom ! s T_a >:onomy. The second target matrix specified a structure 
v/hich indicated which items referred to each of the several reading 
selections. The matrix structure van not always simple since some of 
the items referred to more than one selection. 

Table 3 (see: p. 6o) gives the target matrix and the pattern on 
the primary axes as related to the "process" classification of these 
items . 

It is evident from the results that the pattern does not 
reproduce the target matrix in any satisfactory manner. This finding 
suggests the conclusion that the advance classification of items using 
Bloom"s Taxonomy did not give a satisfactory indication of the way in 
v/hich each item performed. Table k gives the correlation between the 
primary axes in this solution. Table k is on page 61. 
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PROCRUSTES ROTATION OF THE ADVANCE 
CLASSIFICATION OF RIGHT ANSWERS 
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4 . 44 






3.55 


18 j 




io0o; 


Jl.07 


2.60 






2.40 



a. The numbers in Ital ics had the highest loadings. 

b. Only those loadings with an absolute value of 1.00 or greater are 
shown . 

c 0 The items which are starred (*-) approximate the target. 
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TABLE k 



PROCRUSTES ROTATION OF THE ADVANCE 
CLASSIFICATION OF RIGHT ANSWERS 



Correlation Between Primary Axes 





I 


II 


III 


IV 


1 


1.00 








11 


-.96 


1.00 






111 


-o93 


.0k 


1.00 




IV 


.80 


-.91 


-.83 


1 . 00 


V 


o95 


-1.00 


-.9k 


.90 



The primary axes (Table if) were highly correlated, suggesting 
that "by this classification system, there may be only one factor 
present . 

As indicated on page 88 an identical procedure was used to 
examine the data for the possible presence of "content" factors. 
Table 5 (see: p, 62) gives the target matrix and the pattern on primary 
for this Procrustes notation solution. 

The fit of this matrix to the target based on content is only 
slightly better than for process-oriented advance classification. 
Factor V loading with items 21, 2k, and 25 show a nearly simple- 
structure which coincides with the target matrix. These three items 
also formed a unique cluster on the basis of the cluster analysis 
conducted later in the study. It is possible, however, to give a 
process interpretation to the cluster which may mean that this pattern 
for content might be coincidental. 

Aside from these items the pattern did not reproduce the target 



TASLE 5 



PROCRUSTES ROTATION OF THE INFORMATION 
CONTENT OF RIGHT ANSWERS 



Item No, 



Stupidity Awarene: 



Target Matrix 
Aggression Disc 



.pline Progress 



Pa Item on primary 
II III IV 



3* 
if* 



6* 

7 
8 
9 

10* 

11 

12* 

13 
14 

15* 
16 

4- f 

18* 

19 

20 

21* 

22 

23 

24* 

25* 

26 

27 

28 

29 

30 



1,00 
1.00 
1.00 

1.00 



.70 
.70 



• 50 

, if if 



1.00 
1.00 

loOO 



70 



• 50 



1, 
1, 
1. 
1, 
1, 



00 
00 
00 
00 
00 
70 



.50 

. 44 



loOO 
loOO 
1.00 
1.00 



-70 



LIL 



a. The numbers in Italics had the highest loading 
fc. Only those loadings with an absolute valise of 



1.00 

Io00 
1.00 
1.C0 
loOO 
1.00 
1.00 
1.00 
1.00 

• 50 

.70 

r.kU 



■ 53 
.55 
. 77 

.69 



76 



1.05 



1.10 



.65 



5^ 



,61 



hk 



or 



.47 



^1 
.85 



;r are snown. 



0 



■31 



1.13 




I d 24 


96 


67 




.52 


• 51 


.63 




.81 








-1.01 




°53 




.67 


Al 




o9i 


,66 


,66 


1.30 


o57 


I.05 


o78 


-.50 


Ijrl 




-1.04 




.73 


c?8 


069 


. 48, 


m 

oil 


1 . 21 


1.13 




1 0 08 




•54 








.52 




.47 




.64 








.71 


069 



,56 



.73 

..66 
.63 



matrix irj an acceptable maimer. Content would :;oem to be only slightly 
better than process as a means of classifying itf?ms in advance of their 
use . 

Table 6 shows the correlations between the primaries. 



TABLE 6 

PROCRUSTES ROTATION OR THE INFORMATION 
CONTENT OF RIGHT ANSWERS 



Correlation between Primary Axes 





I 


II 


III 


IV 


! 


1.00 








II 


.1*1 


1.00 






III 


-.76 


-.39 


1.00 




IV 


-.52 


-.81 


• 58 


1.00 


V 


.30 


-.12 


-A6 


-.08 



The Interpretati en of Right Answers Using CI us ter Analysis 

The negative results just reported suggested the need to search 
for a multiple interpretation possibility „ Hence, a cluster analysis 
procedure which normalized the same factor matrix by rows as was used 
in the two solutions just given. The normalization involves dividing 
each member of a row in the factor matrix by the square root of the 
communality. Since this value is the length of the vector given by the 



row of factor loadings, this division raises this length to unity (one) • 
The procedure then calculates the interpoint distances from the 
ends of the vector pairs, surface-to-surface , across the hypersphere. 
The square of this distance is the sum of the squares of the differences 
across the rows taken in pairs Q This interpoint distance is then used 
to form clusters in which the wi thin-cluster distances are minimized as 
indicated by the formulae given in Tahle kl , p. 151. The clustering 
begins with as many clusters as variables and ends with ail variables in 
one cluster,. In addition, there is a unique cluster solution for each 
factor solution which might be used or with the inclusion of each 
additional f'actor 0 The experimenter, therefore, was left with the 
problem of determining which of many possible solutions to choose. 
Eepeated attempts suggested an advantage to the process classification 
of Bloom's Taxonomy o It was decided to consider a cluster to have 
recapitulated the advance classification if it contained more members 
from one particular advance category than from any other category » The 
solution which gave the best recapitulation was then sought, by itera- 
tion, for both right and wrong answer clusters. The cluster was 
assumed to be identified on the basis of the recapitulated category. 

For the right answers, in this sense, the best solution was 
derived from an un rotated principal axis factor matrix of six factors „ 
In this solution twelve of the thirty items recapitulated in the 
clusters. This result is four times better than the Procrustes rotation 
to fit content just reported. Since this was 40 per cent of the items, 
in spite of multiple interpretation possibilities, the result was 
reasonably sail s f ac tory . 



o J 

An examination oJ. the data suggested that the first un rotated 
factor might "be a "difficulty" factor. Table 7 on page 66 expands upon 
this relationship. 

In general, the value of the loading on Factor 1 seems to he 
about 50 per cent of the value of the difficulty „ The correlation 
between these two variables was r = .65« 

Does this finding seriously disrupt the use of the six factor 
solution? Table 5 on this page compares the inter-point distance 
clusters as determined with and without Factor 1. 



TABLE 8 

RIGHT ANSWER CLUSTERS 
WITH AMD WITHOUT 
FACTOR 1 



With Factor 1 



10! 



1 

3 
k 
5 
7 
9 
10 
12 
15 
21 



17 

13 
lk 
22 

27 
11 

20 
18 
25 



b 
30 
6 

19 
23 

16 
26 

2k 



28 



29 



1 

3 
k 

5 

9 

] o 
7 

15 



30 
6 
Ik 

27 
29 
] 2 

18 



Without Factor 1 



13 
19 



11 
20 

2^ 



16 



21 



26 



25 



28 



23 



a. The numbers in Italics displaced with the removal of Factor 1. 



a 



TABLE 7 

RELATIONSHIP BETWEEN ITEM CONSISTENCY, ITEM 
DIFFICULTY AND ITEM FACTOR LOADING ON 
UKROTATED FACTOR 1 



Item 


Internal 






NumBer 


Consi stency 


Diffi culty 
— . 


Factor 1 
. . 


1 


,38k 


.086 


-o007 


2 


.4 5 6 


.245 


.150 


3 


.304 


o345 


.176 


4 


.425 


.173 


.037 


c 
J 




221 


- 0 200 




471 


• J J 


. 392 


9 
1 


a JJU 


1 '-4-4 
« J- i r 




ft 


4l ° 


2 g 

• c. y j 


o022 


Q 
7 




. 698 


0 ?45 


10 


,320 


o'J 73 


.074 


11 


.334 


.475 


„216 


12 


. 430 


.34^ 


.181 


13 


.378 


.669 


.118 


14 


.405 


. 460 


.114 


15 


.015 


.050 


-.581 


16 


. 447 


.813 


.314 


17 


.546 


.079 


.194 


18 


-.295 


.014 


-.732 


19 


.463 


.309 


.055 


20 


.269 


.532 


.055 


21 


.404 


.324 


o300 


22 


.001 


.072 


-.58? 


23 


0165 


.583 


- O 084 


24 


o377 


.856 


.402 


25 


.329 


.547 


.340 


26 


.140 


.381 


.101 


27 


.388 


.849 


.414 


28 


.378 


.367 


.048 


29 


. 275 


.827 


. 243 


30 


.158 


.734 


.182 



The It alics in Table 8 on page 65 indicate that seven items move 
"by dropping Factor 1. Thin fact does not seriously affect the replica- 
tion of the advance classification in the now solution. Also, most of 
the items which move to a new cluster do not follow the general rule 
that the difficulty he roughly twice the factor loading. For these 
reasons, it was decided to retain the complete six factor solution 
throughout subsequent analyses 0 

Since a Procrustes rotation was used to determine how well the 
advance classifications fit the data, it was reasonable to use the same 
procedure with the cluster analysis data. Table 9 (see: p. 68) reports 
the target and pattern matrices in solution. 

The pattern on primary in Table 9 is a good reproduction of the 
simple structure of the target matrix. Furthermore, if the values of 
"h" (i.e. the square root of the commonality from the six factor matrix) 
are taken into account, then the pattern seemed to fit even better. The 
cluster analysis was produced from the interpoint distances derived .from 
a normalised matrix. For this reason, to find the length of the largest 
vector approximating the overall length of the vector in the un rotated 
six factor solution add to the acceptability of the solution. 

In short, the cluster solution was a far better fit to the data 
than either of xhe two methods of advance classification. The independ- 
ence of the interpoint distance clusters from rotation problems also 
reinforces the acceptability of this approach. Table 10 (see: p. 69) 
reports the correlation between the axes. 

The relatively low correlations between the axes in Table 10 
further strengthens the support for the use of interpoint distances as 
an analytical technique for the problem under examination in this study. 



TABLE 9 

PROCRUSTES ROTATION OF THE INTEEPBETABLE CLUSTERS OF THE RIGHT ANSWERS 



Item [ 
No. j 


h 


n 

"i 


Tar; 

c 5 


jet Matrix 
o 




°8 


C 10 


; 


Pattern 


on Primary 

n 

7 


C 8 




}- 

i ! 


AO 


1.00 












! i A 

; ao 










2 ; 


• 55 


1.00 












! 










8 j 


.62 


1,00 












; .63 










28 J 


• 71 


IcOO 












! ,62 










7 ! 


M 




1=00 










: ,52 










22 ; 


.71 




1,00 










.77 










23 ! 


• 53 




loOO 


















-A'i 


9 ! 


o52 






1.00 








! 
I 


A8 








27 ! 


• 71 






1 . 00 








1 


. 49 








io ; 


• 57 








1.00 










A^ 






ii | 


.48 








1.00 










A6 






16 ! 


.60 








1.00 










J c 








.28 








1.00 










„23 






12 | 


• 70 










1.00 










.68 




20 ; 


• 57 










1 = 00 










A6 




26 [ 


.61 










1.00 










AO 




21 J 


.61 












1,00 ! 










.60 


2^ ; 


,68 












1,00 ; 


-A9 








• 58 


25 J 


. 62 












io00 ; 










o5^ 



a. The symbol "h" stands Lor the square root oL the communality. 

b. Only those loadings greater than an absolute value of .35 a ^e reported. 

c. This loading, although /_°35j is reported because of its low value for 'h'. 
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TABLE 10 

PROCRUSTES ROTATION OF THE 
CLUSTER ANALYSIS OF THE 
RIGHT ANSWERS 



Correlation between primary axes 

C l S °6 °? C 8 C 10 



C l 

s 

°6 
C 7 

C„ .06 -.29 -o03 -.08 loOO 

o 

C .12 .20 -.12 .09 1.00 



1.00 








-.31 


1.00 






-.07 


.18 


1.00 




-.17 


.31 


-.03 


1.00 


.06 


-.29 


-,03 


-.08 


.12 


.20 


= 15 


-.12 



The largest correlation between axes given on Table 10 was -.31 which 
represents an angle of more than 70° between this pair of axes. It is 
possible, therefore, to state that interpoint distance clusters produced 
a solution which was independent of the usual rotation problems and 
which was approximately orthogonal. The procedure gave a very satisfac- 
tory statistical representation of the data. On the other hand, the 
failure of the advance classification systems to render interpretabil- 
ity left the researcher with the problem of interpreting these clusters. 

Finally, a matrix which is in fact categorical in the sense 
given on page would be expected tc have low correlations between the 
primary axes for a Procrustes rotation. When the dimensionality of this 
categorical matrix has been reduced by factor analytic techniques, 
before rotation, the best fit from a Procrustes rotation should display 
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the retained length u of the vectors (i.e. the square roots of the 
communal i ties) as the loadings of the pattern on the primary axes 0 
Also, the pattern on the primary axes should display a structure similar 
to the target matrix. All three of these properties must be present for 
the inference to "be made that the principal axis matrix is categorical „ 
It is evident from Tables 9 and 10 (pages 68 and 69) that all these 
conditions were met, strongly suggesting that the cluster analysis 
technique gives a good categorical solution to the six factor data 0 The 
low internal consistency was thus explained, suggesting profile scores 
from the clusters may better describe the data than total correct scores 
alone „ Finally, the close match between the original vector lengths and 
the facdor loading in the Procrustes rotation and the near orthogonality 
of the factors suggests that this solution clearly identifies the homog- 
eneous subtests of the right answers - These findings contradict the 
apparently poor showing' of this test in the usual analytical setting,, 

The Meaningful Interpretation of Item Clusters 

With the failure of the advance classification, the multiple 
interpretation hypothesis was the only alternative to investigate hence 
some reasonable common ground was needed for each cluster if answering 
was systematic. The first possibility which had to be either confirmed 
or eliminated was that the clusters were sufficiently strongly content 
oriented to suggest content as a possibility. The only cluster in which 
content was at least a strong contender to process was the Cluster C 
containing items 21, 2k, and 25o All these items were based upon reading 
selection number 1 five (Progress), but also, all three were classified as 
Analysis items. A closer look was taken at this cluster, along with all 
the others (see: Appendix C) « This look supported the process 
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interpretations over the content interpretation.) Tims, it was reasonable 
to reject content as a basis for the interpretation of any of the right 
answer clusters 0 

The misfit items in the identified clusters were examined by 
ilogi co -semantic (structure-meaning) analysis to determine if they could 
be reasonably reclassified in common with the recapitulating category . 
Success by this procedure would lend support to the multiple interpreta- 
tion hypothesise All misfit items in clusters C^, C^, C except possibly 
item 7 could be reclassified to fit the overall classification of these 
clusters, adding three or four items to the original twelve. 

The logi co-semantic analysis of Cluster 0 reveal ed that the 

o 

formation of an inductive structure within a particular reading selec- 
tion could possibly be the basis for synthesis items. For this reason 
C was classified as Synthesis, adding another three items to the 

o 

support of the multiple interpretation possibility „ 

This procedure gave 19 of the 30 items, or 63 per cent, of the 
items a reasonable classification based upon process 0 Since the inter- 
rater reliability was only moderately high (r = ,83) and since the cross- 
validation based upon interpretable reasons found in Powell (1968) was 
only 6k per cent, this level of recapitulation can be considered to be 
satisfactory o 

For the remaining four clusters the interpretation was ambiguous. 
Cluster C containing items 15 and 18 seemed to be simply poor items „ 
Cluster C^ seemed to involve implication or extrapolation from the rel- 
evant content suggesting comprehension but in the absence of better 
definitions for strategies it was safer to cal 1 this cluster ambiguous 
(see: p. 197). The remaining two clusters (C and C ) seemed to be 



strongly influenced l\y the nature of the foils in the items which tended, 
in general, to lower them to Comprehension items; but each had some 
Analysis characteristics leaving their classification ambiguous, which 
seemed to further support the multiple interpretation hypothesise These 
decisions are summarized, on Table 11, page 73* 

Trying to make Synthesis items by combining two or more reading 
selections proved unsuccessful for a number of reasons, suggesting the 
need for more research on this methods 

Thus, the results of the logi co-semantic analysis of the item 
clusters suggested reasonable support for l) the multiple interpretation 
hypothesis, 2) the transcendence of process over content, and 3) the 
suggestion that foils influence item performance,, 

The summary just presented is supported by a detailed discussion 
given in Appendix C (see; p 0 186 f f ) . The details are also presented, 
first, because it was felt that the effective reclassification repre- 
sented evidential support for the multiple interpretation hypothesis, 
and second, because subsequent researchers might find value in an 
independent evaluation of the logic behind the conclusion of this study » 

The Meanin gful Inte rp relation of Wrong Answer- Clusters 

The tables for the factor analysis of the wrong answers were 

very large since they involved 90 variables (see: Appendix A, Tables Jk 
6 

to 39). The analysis of the right answers showed that interpoint 
distance cluster analysis gave a good representation, in a statistical 



The phi coefficients in these tables are in the following 
sequence: variables 1 to 30 represent ID to 30D-, ; variables 
31 to 60 represent IV to 30D 5 and variables 6l to 90 
represent ID to JOH'o 
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TABLE 11 

CLASSIFICATION AND MEMBERSHIP OF 
RIGHT ANSWER CLUSTERS AS 
DERIVED FOR GROUP A 



Clus cer 
Label 


Item 
Members 


hip 




Advance 
Classif i cation 


Interpreted 
Classif: cation 


C l 


la 


2 


8 


28 


Analysis 


Analysis 


C 2 


3 


17 


30 






Ambiguous 


C„ 


k 


13 


6 






Ambiguous 




b 


14 


19 








°5 


7 


22 


23 




Analysis 


Analysis 


°6 


1 


2? 






Analysis 


Analysis 


°7 


IP. 


11 


16 


29 


Evaluation 


Evaluation 


C 8 


12 


20 


26 






Synthesis 


°9 


15 


18 








Ambiguous 


c io 


21 


11 




2i 


Analysis 


Analysis 



The numbers in Ital i cs were all in the category named in the 
advance olassi fi cation . 

The "interpreted classification" of each cluster is given in 
Appendix C beginning on page 186. 



sense, for those data. Finally, the in terra ter reliability was Lowe- 
for wrong answer.; than right answers, suggesting that the advance 
classification of foils would be less likely to reappear in the data 
than the right answer. For these three reasons, the interpretation of 
the wrong answers began with the cluster analysis. Attempts to fit the 
unrotated principal axis factor matrix to the advance foil classification 
and to the foil content were not made. There was no reason on the basis 
of the characteristics of the results of the cluster analysis to assume 
that the results of these two preliminary steps would have been 
substantially different for the wrong answers than they were for the 
right answers. 

The best replication of the advance foil classification which 
could be found in the cluster analysis involved a 25-factor solution to 
the ph i correlation matrix of the wrong response::, and 15 clusters of 
foil in this solution. This replication placed 28 of the 90 foils 
(or 31 per cent) into clusters which might be considered equivalent to 
the categories of the advance classification on the basis cf the most 
frequently occuring category of foil in that cluster. This proportion 
(31 per cent) was not quite as good for the wrong answers as the 
corresponding proportion (40 per cent) was for the right answers. 

In getting this best replication, the same procedure was used 
for determining the number of clusters as was used for right answers. 

All 90 foils were clustered. Once the best replication was 
found, however, those foils for which the selection ratio was less than 
.06 were dropped from further consideration. This precedent was 
established when the interpretation of the right answer clusters was 
being made. The result cf the dropping cf foils cf low selection ratio 



was to reduce the number of foils under consideration from % to 60 „ 
Of these 60 foils, 18 foils, (or 30 per cent) continued to meet replica- 
tion requirements. The proportion is essentially the same as bef ore 0 

The interrater reliability for foil classification (r = .62) 
was not as high as for the right answers. It is possible that this 
figure might have been considerably lower, had the right answers to the 
items not been clearly indicated to the raters at the time of the rating. 
The procedure of rating involved comparing the foil, stem, and right 
answer relationships to the definitions of foil categories as listed in 
Chapter III. It became fairly evident that at least some of the foils 
might be placed quite reasonably into several different categories. The 
problem of multiple classification of foils will be dealt with in more 
detail in the interpretation of the wrong answer clusters, 
(see: Appendix G, p. 209 f f ) . 

Briefly, the advance classification of foils were arranged into 
three or four possible general classes. These general classes were 
l) Strategy Errors, 2) Misreading, 3) Misinformation, and k) Other,, 
The Other, (0), category was for foils which for some reason cculd not 
be readily classified into some established category. In the Experi- 
mental Test this category referred primarily to the foils for items 19 
to Zk inclusive. In these items a different item format was used to 
that of the remainder of the items. It was not possible, in advance , 
to know whether or not this difference in item format would influence 
the way in which the foils "behaved statistically. It was assumed in the 
interpretation of the cluster's that "0" type foils which were found to 
be in reasonable association with foils of specified categories were not 
influenced in their behavior by the item format. If the Other (o) type 



foils formed their own unique clusters, these clusters were assumed to 
represent categories of foil not identified "by the advance classifi- 
cation of foils.. Two such categories- appeared in the data. 

The specific categories used in the advance classification of 
foils were as follows: 





Name 


Symbol 


lo 


Overgeneralizati on 


OG 


2. 


Ov ersirnpli f icati on 


OS 


3. 


Substitution 


Sub 


k. 


Inv ersion 


Inv 


5. 


Invalid Assumption 


I A 


6. 


Irrelevancy 


Irr 


7. 


Common Misconception 


CM 


6. 


Word -Word Link 


WW 


9. 


Transposition 


Tr 


10. 


Redefinition of Terms 


RT 


11. 


Other 


0 


A 


cluster was identified on 


the basis of 



occurring foil of a particular advance classification in that cluster. 
The identification given in Table 12 (see: p.- ??) is based on all of 
the foils in each cluster before low selection ratio foils were 
eliminated. This identification was used as a starting point for the 
attempt at meaningful interpretation of each foil cluster,, The final 
meaningful interpretation of each cluster is also given in Table 12. 

Once again the members of each cluster were examined in an 
attempt to determine the common basis upon which these foils clustered 
together,, Particular attention was paid to the foils which were not in 



TABLE 12 



CLASSIFICATION AND MEMBERSHIP OF WRONG ANSWER CLUSTERS AS DERIVED FOR GROUP A 



Wrong 








Foils in 


Each 






Identification 


Interpretation 


Answer 








Wrong 


Answer 


Cluster 




by Advance 


by Logi co-semantic 


Plll^-f-Q>i 


















j; oil Lidobii ica. uion 


And.i_y s 1 s 






2D 1 


23D ] _ 


22D 3 


8D ] _ 


17D, 






OG 


OG 


w 0 


o r> a 


?0B 
^J2-L 


7D 


24D 


4D. ; 


18D 


8D 


'3D, 


2 5 Do 29Do Sub 


Eliminated 


w„ 

") 


^1 




3od; 


5D 2 


6D„ 


iqD^ 


13D_ 


29D 0 
2 


CM 


w, 

4- 


% 




11D„ 


27D 1 




30D 3 






u 


RT 




6D^ 


16D„ 


12D Q 
j 


11D„ 
c. 


12D 2 


2D 3 


19D 9 


20 D 3 


0 


Inv 




9D ] _ 


17D- 


25D X 


21D^ 


7D 


26D 3 






u 


NS 


w 

t 


10D 1 


19D^ 


4D 3 


13D 3 


12D 


15D 3 


18D 0 
2 


26D ? 


8D 3 U 


U 


W 8 




28D 3 


13 D] 


UD 2 










IA 


I A 


V 

"9 


14D 


"i Tl 

^2 


3D 3 


28D 1 


10D 2 


I7D 3 


22D S 


5 


uo 




w io 


16D 


20D 
—^2 


29D n 












U 


RT 


w 

11 


1 RT] 


7D 2 


6D 3 


loD„ 


2D 2 


10 D 0 

j 






WW 


WW 


W 12 


19D 1 


22D 1 




2^D 9 










0 


\ 


W 

w 13 


21D 


28D 2 


2 3D,. 
2 












0 


0. 


w 

w i4 


3D 2 


25D 2 


QT) 


27D 2 










Irr 


Irr 


V/ 

15 


1^D 0 
2 


30D o 


2 AD 


27D^ 










U 


Tr 



a. Foils indicated "by I tali cs were dropped from the analysis on the basis of a low selection ratio after 
the factor analysis was completed. 

b c The U stands for an "unclassified" cluster. 
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the common advance category present in the cluster. In addition, the 
evident influence of foils upon the performance of an item led to the 
possibility that the interpretation of a foil by category might depend 
upon the point of view of the interpreter,, The low interrater reli- 
ability suggested this possibility. For at least some foils several 
classifications may have been reasonable. In this ease the foil cat- 
egories would not be independent or unambiguous,, If foils are only 
interp re table after a test has been given, then these interpretations 
become specif ic to the particular examinees upon whose pe rformance the 
interpretations were based. That is, the expectation would be that 
where mul tide fined foils were concerned, the proportion of cross- 
validating foils on a cluster- for-cluster basis would be low, 

A procedure similar to the one employed for item clusters was 
used with wrong answer clusters,, For complete details see Appendix C 
(see: p. 209 f f ) . 

The possibility of multiple interpretations of wrong answers was 
most clearly illustrated in the case of foil 2TL . Briefly, the classifi- 
cation of this foil was initially set as Substitution (see: p„ 156) 
because it substituted a conjunctive for a disjunctive relationship. 
However, it is possible to look at these relationships as Overgeneraliza- 
tion or Oversimplification as the discussion in Appendix C indicates 
(see: p. 213). 

Apparently, considerably closer analysis of the logico-semantic 
relationships within items and their 1 components would probably reveal a 
range of possible interpretations for each item. This multiple 
interpretation effect is consistent with the findings using right 
answers. For foil clusters, the lower recapitulation left five clusters 



containing 28 foils from the 60 that remained classifiable because of 
the advance classification of their members » Of these, ten foils did 
not have these classifications in advance, but eight of them could be 
reasonably reclassified into the category of the total cluster„ 

Only one cluster (W_) which contained nine foils could not be 
classified by logico-semantic analysis although one or two more of these 
were shaky at best. One cluster (w ) was classified Common Misconception 
on the basis of logico-semantic analysis because the characteristics of 
this cluster were reminiscent of the finding's of Powell and Isbister 
(1969) thereby suggesting consistency between two independent groups on 
the same test. Three categories (ft'S, 0^ , 0 ) were added to the Guide- 
lines because of the clustering as well as the logico-semantic analysis „ 

Clearly, the Guidelines were not exhaustive and were not mutually 
exclusive, as already anticipated by the establishment of the "Other" 
classification, and by the multiple interpretation hypothesis „ Because 
of the support for this hypothesis it is likely that the interpretations 
formulated should be confined to the group upon which they were derived „ 

A Possible Hierarchy of Foil Cate gorie s 

During the above analysis it became evident that foil categories 
were, to a degree, interchangeable. For instance, foil 2D., could be 
classified as OG, OS, or Sub„ In only a few cases did reclassification 
within a wrong answer cluster of specific foils seem unreasonable. No 
attempt was made to exhaustively reclassify foils. Instead, the attempt 
was to reclassify specific foils to fit the pattern which seemed to be 
evident within the entire cluster as derived from Group A. Table 13 on 
page 80 summarizes the reclassification which occurred for each foil 
class in each of the wrong answer clusters 0 



TABLE 13 

RECLASSIFICATIONS WHICH WERE! 
MADE OE SPECIFIC FOILS 





Cluster 


Advance Class! fi ca ti on 


Re c 1 a s a i f i ca t i on 


W l 


Sub 


became 


OG 


w 3 


Sub, OS and OG 


became 


CM 


W 5 


OS and OG 


became 


Inv 


W 6 


Inv, Sub, OS, Irr 


became 


NS 


W 8 


Inv an d Sub 


became 


IA 


W 9 
W ll 


Irr, Inv, and Sub 


became 


OS 


Irr and OS, Sub, CM 


became 


WW 


V/ 13 


OG 


became 





Since the 0 jlass in Table 38 was treated essentially as though 
it were unclassified it was omitted from this table, as were foils wnicn 
did not classify within their respective cluster. Table 13 lists the 
Advance Classification of the foils in each of the 'wrong answer clusters 
which were different from the final classification of that cluster and 
the final classification given. As such it summarizes Tables 53 to 66 
inclus ive. 

Put the other way around, the substitution category of foils 
disappeared by reclassification as OG, CM. NS, and IA. Besides 



retaining a cluster for itself, QG also reclassified as CM, Inv, and Op 
OS alsc retained its own category but also became WW, CM, Inv, and NS. 
Similarly, some of the Inv foils re.jiass.if led to NS, IA, and OS. 
Finally, some Irr foils became NS, OS, and WW. Figure 2 presents these 
changes diagrammatj cally. 




FIGURE 2 
FOIL RECLASSIFICATION PATTERN 

In Figure 2 there is a double headed arrow between OS (Over- 
simplification) and Inv (inversion). This arrow means that at least one 
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OS foil was reclassified as an Inv. and at least cue Inv foil was 
reclassified as an 03 (Overaimpii f ioation) - The other arrows in the 
figure can be interpreted in the same way. 

Sub foils disappeared, and hence are dropped from this diagram. 
From the analysis of the right answer clusters it became evident that 
CM and WW type foils tended to lower the level in the taxonomy of an 
item while OS and 00 foils tended to raise it. 

From the reclassification pattern in Figure 2 OS foils seem to 
be pivotal in the sense that it was most frequently involved in changes 
(from or to). Furthermore, NS, Inv. and 0G may be higher order foils, 
and CM, WW, and Irr of lower order foils on the basis of the changes 
these foils seem to cause in the reclassification of right answers. 

Thus the general order of a possible foil hierarchy would seem 
to follow a vertical axis in Figure 2, with the lowest level at the 
bottom. 

0 and IA foils are ambiguous in this pattern because they have 
one-way linkages only. 

Other evidence to support the possibility of an hierarchy of 
foils should be found before this property of foils can be considered tc 
be established from the data. Any set of random variates can be ordered 
on the basis of relative magnitude into an arbitrary hierarchy. Two 
random variables will be uncorrelated . If, however, two correlated 
hierarchies can be produced from two independent variables, this produc- 
tion would suggest that these two variables may be functionally related. 

One possible source of such an arbitrary hierarchy is to consider 
the average total-correct score of the respondents selecting each inter- 
preted foil as listed in the Appendix in Table AO. The average of these 



averages could be found for each interpreted wrong-answer cluster 0 This 
average total-correct score for all the members of a particular cluster 
may reflect a systematic characteristic of the examinees which may 
relate the kind of "error" made to the total-correct score achieved,, 

If foil selection patterns reflect a functional relationship 
between "errors" and total-correct score, the rank of a foil (from one to 
three for high to low) within an item based on the average total-correct 
score should also reflect this functional relationship.. If each item is 
independent of each other item, the average within item rank of the foils 
across a cluster should also be independent of the average total-correct 
scores of the foils in that same cluster,, This is a reasonable assump- 
tion since only in item cluster does there not seem to be a close 
relationship between item clusters and foil clusters for the items in 
that cluster. An exception to this expectation would arise in the event 
that there is, in fact, a functional relationship between the kind of 
error and the total-correct score „ In this latter case, both these 
procedures should produce roughly the same hierarchy. 

In addition, this hierarchy would be expected to reflect the 
pattern v/hich seemed to be evident on the basis of the pattern of re- 
classification of foils, the reclassification of items as influenced by 
foils, and the indications which also arise from other research into 
wrong-answer patterns 0 

Since this part of the study is exploring the possibility of a 
hierarchy among the foil categories, the rank order of the two variables 
just described was determined. That is, the rank order of the average 
for each interpreted foil cluster of the average total-correct score 
associated with each foil in that cluster wa.s found „ This procedure 



Re- 
established art arbitrary ordinal relationship an:cng the interpreted foil 

clusters. A ranking of the average within item ranks was also estab- 
lished. If there is no functional relationship between foil type and 
total-correct score, the rank order correlation between these two 
ranking systems should be near zero. That is, the within item and 
between item characteristics would not be related to the average total- 
correct score on each foil in a functional manner. Table 1^ shows the 
ranking of foil clusters by two independent methods (see: p. 85). 

Each of the two ranking systems in Table 1^ tend to support the 
more general ranking procedure suggested by the reclassification pattern. 
The comparison between the two ranking systems gave a rank ordex- correla- 
tion of p = .68 which is significant (p - .01 for two-tailed test for 
N = 12). 

Other findings support this hierarchy. For instance, Powell and 
Isbister (1969) found that IA foils correlated negatively to Synthesis 
items. Foils in this category would be expected to have the fairly lev. 
rank this study suggests for IA foils. On the other hand, placing this 
foil type just above the content-linked misreading categories may be 
placing it too low in the hierarchy. Similarly, RT foils seemed to be 
content-linked, and would be expected to have a low rank for this reason. 
Support for this extrapolation was also evident. The tendency for lrr 
foils to distract middle and high level performers has already been noted 
in Powell (1968) and Powell and Isbister (1969). Thus the ranking by- 
average total-correct score which seems to place lrr foils in sixth 
place would seem to be too low. Similarly, the ranking on a within item 
basis as sharing top place would seem to be too high. 

By itself, the ranking on total-correct scores would seem to be 
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TABLE I'l 

RANKING OF FOIL CLUSTERS BY TWO 
I N DEPEH DEN T METHODS 



Interpreted 


Rank by total- 


correct 


Rank by ave 


rage 


foil 


cousters 


averages wi 


thin 


within item 


rank 






clus ters 




by clust 


er a 




J_l 1 L yi pi c; u d b .L Ull 


A\ r O ra £*T! 


Ran k 


_/\ v e ra ge 


Rank 


W l 


OG 


12.0 


1.0 


2.4 


1.5 


V 13 


°2 


11.9 


2.0 


2.67 


4.5 


W 6 


NS 


11.8 


3.0 


2 . 6 


3.0 


W 5 


Inv 


11.7 


4.0 


2.7 


6.0 


W 9 


OS 


11.6 


5.0 


3.0 


9-5 


W-, , 


Irr 


11.5 


6.0 


2.4 


1.5 


w 


Tr 


11.3 


7.0 


2.75 


7.0 


15 






w 3 


CM 


11.2 


8.5 


2.9 


O • w 


W 8 


I A 


3.1.2 


8.5 


'••"I 


4o 


w 4 & w io 


RT 


11.1 


10.0 


3.25 


11.5 


w n 


WW 


11.0 


11.0 


3.0 


9-5 


W 12 


°1 


10.8 


12.0 


3.25 


11.5 



a. The within item rank includes the right answer and does not drop 
any foils. 



fairly closely related to an hypothetical foil hierarchy on a between 
item basis. Similarly, the within item ranking would seem to be more 
closely related to the influence of foil categories upon the items. 
However, these two variables are obviously related, as indicated by the 
significant rank order correlation between them.. These results suggest 
overall systematic answering which Influences the statistical outcomes 
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of both within and between item events, 

perhaps the most reasonable arrangement, for the hierarchy would 
be to consider both events to be interdependent. The simplest approach 
in this case, is to consider the average rank of. the two separate 
ranking systems used and to rearrange the foil categories accordingly. 
The resulting hierarchy would then reflect the influence of both between 
and and within item events. Table 15 gives the results of this 
procedure . 



TABLE 15 

RERAN KING OF FOILS 
BY AVERAGE RANK 





Cluster 


Foil Classification 


New Rank 





W l 


OG 


1.0 


W 6 


NS 


2.0 


W 13 


0 o 
2 


3.0 




Irr 


k.O 


W 5 


Inv 


5-0 


W 

8 


IA 


6,0 


W 

9 


OS 


7.5 


W 15 


Tr 


7.5 


w 3 


CM 


9-0 


w ll 


WW 


10.0 


\ & w io 


RT 


11.0 


W 12 


°1 


12.0 
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This rearrangement put RT nearer the bottom, OS nearer the 
middle (as its pivotal position suggested), and Irr nearer the top than 
the average total-correct rank, which seems to bo reasonable relative to 
the available evidence. This new reordering has not- been used in .sub- 
sequent analysis because of the problem of the relevance of the within 
item ranking to this hierarchy. Farther research is needed before the 
most probable sequence in the hierarchy has been established. 

The lowering of the performance level of synthesis items in. 0 

and CL by Tr, CM, and RT foils further supports this hierarchy. Foil 
j 

17IU was reclassified as OS in the logico-semantic analysis of the wrong 
answer cluster. The pivotal position of the OS category would seem to 
support the "double-strategy" interpretation of C 0 . Foil ^D^ was 
changed from 00 to CM, which suggests that this foil, and which was 

unclassif iable may have combined to lower this analysis item to a 
comprehension level. Right answer cluster C„ may, therefore, be a 
comprehension level cluster and not a double- strategy cluster as 
suggested earlier. These suggestions are too tentative to alter its 
"undetermined " classif i cation . 

Further support for this foil hierarchy may be found by taking 
the new foil ranks and determining from these the average foil rank of 
each right answer cluster 0 Table 16 (see: p. 88) gives this 
information. 

Table l6 shows three additional characteristics which may add 
support to the concept of a foil hierarchy. First, the order conforms 
to Bloom's Taxonomy with C , the apparent split strategy cluster falling 
between the analysis and undetermined categories. Second, evaluation 
fell out of order as occurred in the Kropp, Stoker, and Bashaw (1966) 



TABLE 16 

ORDERING OF RIGHT ANSWER CLUSTERS SY AVERAGE FOIL RANK 



Right 
Answer 
Rank 


Right 
Answer 
Cluster 


Inte rpreted 
Classification 


Foil Types Present 
in Clusters 


Average 
Foil 
Rank'' 1 


.L 


C 5 

n 

w i 


An a 1 y sis 


nr ( m n w<* uw 

<JL- ) , L'Oj U 0 i ItO) WW 


T P 




Anal ysi s 




'-r , c 


3 


c, 
t> 


Analysis 


Irr (2), NS 




k 


p 

^8 


Synthesis 


Inv (3), NS, CM 


j 0 <-. 


5 
6 


C 10 

C„ 
2 


Analysis 

Undetermined 
(Analysis or Comprehension?) 


NS (2) . 0„, 0, , Irr, Tr 

c 1 

OG, OS (2), Irr, Tr, RT 


5 = 5 
5o7 


7 


C 7 


Evaluation 


OS, IA, Inv, WW (2), CM, RT 


8.3 ■ 


8 


c ^ 


Undetermined 


OS, IA, Inv, CM (2), 0 1 


8o5 


9 


c_ 

5 


Undete mined 
( C omp re h on s i on ? ) 


CM (3), WW 


0 i 
/'- 


10 


°9 


Undetermined 


WW 


11,0 



a 0 These figures are the average of the Coil ranks by total-correct averages within cluster as given in 
Table \k, page 85c 

b c The number in parentheses indicates the frequency of recurrence of a particular foil classification 
within the right-answer cluster when this frequency is greater than one„ 



00 
cc 



study o Third, the cluster C c . if legitimately classified by one foil, 
fell at the bottom c The Procrustes rotation match- to-content suggested 
that this cluster might be a content-oriented cluster,, In this case, 
its location was also reasonable 0 

Further support for the hierarchy can be found by determining 
the rank order of the right answer clusters in several ways. When the 
between and within item foil ranks, as given in Table lk 9 page 85 } are 
used to calculate the right answer cluster rank, a highly significant 
correlation is found (p = .90, p <' .001 for N = 9)„ This finding 
supports both the hierarchy and the apparent influence of foils upon 
item performance „ All of the other possible rankings produced correla- 
tions which were not significant, including a comparison between ranking 
by average foil rank and average total-correct scores. 

Be suits Related to the Subsumptive Property of the Taxonomy 

Another interesting finding in the results just reported is 
the fact that the average difficulty of each cluster seems to be un- 
correlated with the rank of each cluster., This correlation is p - o 05 
and does not change much if within item rank is used cr if the composite 
rank is used* This finding would seem to contradict the subsumption 
characteristic assumed to be part of Bloom's Taxonomy. 

From the data available in this study, another test of this 
subsumption hypothesis can be made. If the subsumptive property holds, 
successively higher members of the right answer hierarchy should be 
obliquely related. The Procrustes rotation gave a good fit to the 
target for the cluster analysis (see: p. 67). In general, the 
relationship among the primary axes would seem to be orthogonal. 



However, the sarr.pl jii£ distribution of those correlations is unknown so 
that their statistical significance of these correlations cannot be 
determined. In this case, the actual values cf these correlations, 
arranged in order on the basis of the hierarchy of right answer clusters 
may reveal a systematic pattern. Table 1? gives this data. 



TABLE 17 







POSSIBLE 


SYSTEMATIC OBLIQ 


UlTY 








BETWEEN 


ORDERED CLUSTEF 


_ s a 








c 

7 
f 


°10 


C 6 




"l 


c io j 


-.12 










c , 

6 ! 


-.03 


! -15 




Relations 


among 


c r ! 
5 i 


.31 


J .20 


.18 


Analysi 


s clusters 


n I 

"i ; 


-.17 


.12 


-.07 


-.31 




C 8 j 


-.08 


o09 


-.03 


-.29 


.06 





a. This table is based on Table 10, page oQ 



If the point i.s stretched to the ultimate and correlations 
greater than or equal to an absolute value of .15 are considered 
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oblique (r ^ |.1:J), there may be seven (?) of these fifteen (15) 
relationships that could possibly be considered as oblique. In this 
case three (3) of these relationships are oblique along the diagonal of 
Table k2, but all are within the analysis grouping. Four of these 
seven are found among the six of the analysis grouping. The other three 
are among the nine relationships outside the analysis grouping. The 
highest of these correlations still represented an angle larger than 70° 
(r = - .31) and the greatest proportion of these slightly nonorthorgonal 
angles are found among analysis clusters. Such slight pattern as might 
have existed did not seem to be too important outside of the analysis 
group of clusters. The relationships among the clusters did not seem to 
support this assumed subsumptive relationship between levels of the 
Taxonomy . Kropp et al (1966) found that the order from the simplex 
analysis did not consistently support this subsumptive property either 
(see: p. 21) . In short, the lack of correlation between average item 
difficulty and cluster order would seem to be further evidence towards 
the probable refutation of the assumed subsumptive relationship between 
the levels of Bloom's Taxonomy . 

The fact that the correlation between the order of the right 
answer clusters and the average total- correct score was negative 
suggests a general tendency for examinees to do better on low level 
items, which suggests a ceiling effect present in this difficult test. 
The test was also shown to be difficult by the average total- correct 
score, which was 12.19 out of a possible 30 items for Group A. This 
possibility found further support in the cross-validation part of the 
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C r o s s - val 1 fl a t i o n of the Analysis 

The multiple interpretation hypothesis supported thus far, and 
the apparent foil hierarchy, suggested two possibilities with respect to 
cross-validation c , First, the attempts made to cross-validate these 
findings might not be successful. Other evidence for systematic 
responses may, therefore, have to he sought, such as the hierarchy, 
which might be supported more strongly in the cross-validation than the 
individual clusters. 

The total-correct means and variances for each of these two 
groups on the experimental test were used in a t-test for independent 
samples in order to confirm the equivalence of test scores of these 
two groups „ Table 18 contains this data,, 



TABLE 18 

COMPARISON BETWEEN GROUP "A" AND GROUP "B" 
ON TOTAL- CORRECT SCORES 

Group A Group B df t p c 

X 1 - 12.19 X = 11.91 275 083 o^O 

S 1 2 = 8.62 S 9 2 = 7„60 
N x = 139 N 2 = 138 



a. Probability of t_ equal to or larger than .83 is given for a 
two-tailed test. 



On the basis of the results given in Table 18, page 92, the two 
groups would seen to belong to the sane population so far as the 
experimental test means and variances are concerned* 

Cross-validation of right answe rs 0 Tnree different comparisons 
were used in the cross-validation of item clusters, l) Those right 
answers from the advance classification which were used to help identify 
the item clusters were checked against the clusters of Group B. 2) The 
clusters which occurred from the answers of Group A were compared with 
the clusters from Group B. 3) The right answer clusters were divided 
into three groups of about equal size based on their av erage foil rank, 
and this division was cross-validatedo Table 19 gives the first two of 
these comparisons. The cede C' is used to refer to Group B's clusters 
(see: p 0 9^)° 

The rule used in Table 19 (see: p„ 9k), once again, was the 
most frequent repetition of items within clusters for Group A and Group 
Bo Only four of the 12 items (33 per cent) which were grouped by the 
advance classification and retained this grouping in the clusters were 
found to cross-validate in Group Bo Thus, the advance classification 
holds up about as well (or as badly) in cross-validation as it did in 
the clustering* 

Comparing cluster by cluster, there were ten items (33 P er cent) 
in the clusters from Group A which occurred in equivalent clusters for 

Group B 0 Cluster C' contained two members from C r but also had all of 

5 5 

C^ in it (items 15, and 18) which, leaves the definition of CJ. ambiguous 0 
In any case, it was evident that the clusters themselves did not 
cross-validate any better than the advance classi ;'i cation„ 



SUI- 
TABLE 19 

CROSS-VALIDATION OF TIIE RIGHT ANSWERS OF 
GROUP A BY GROUP E FROM THE ADVANCE 
CLASSIFICATION AND THE ITEM CLUSTER 







G roup 


A 








Group B 




Cluster 


I 


Item 




Cluster 


j 




Item 




C l 




2* 


8 


28 


"1 


j ] * 


2* 


12 




c 




17-x- 


30 




C 2 


i 3* 


lk 


17* 


9 Q 


c_ 


j 4* 


13 


6* 




c ; 


j •'+* 


26 






c, 


1 5 


1'i 


19 




c. 


\ 5 


10 


13 




°5 


1 7* 


22* 


2? 




? 

C 5 


j 7* 


15 


18 


22* 


C 6 


! 9* 


27* 






1 

C 6 


• 8 


27* 


9* 


30 25 


i 


{10 


11 


16 


29 


1 

C 7 


1 n 


19 






C 8 


;i2 


20 


26 






j 16 


28 






°9 


jl5 


18 






1 

C 9 


; 20 


21 






C 10 


j 21 


25 


2^ 




c' 

10 


| 23 


24 








a„ The 


numbe rs 


in Italics c 


rose-validate t 


he advance c 


las si f i 


eatiorio 


"b , The 


starred 


(*) numbers 


cross-val idate 


the item clusters,, 





Table 20 (see: p. 95) has the right answer clusters arranged by 
their rank "based on the average total -correct score as given in Table 16 
(see: p Q 88). and arranged into groups of three clusters with (items 
'i 5 and 18) dropped „ The cross-validation was then repeated 0 

Instead of 33 per cent there is now 57 per cent cross-validation 
although with only three groups to match instead of nine,, An increase 



TABLE 20 

CROSS-VALIDATION OF ITEMS, GROUPED BY AVERAGE FOIL RANK 



Group A Proportion Group B 

Cross-val idating 

Cluster Cluster 



High Group 


°5 
C l 


7 a 
1 


22 

? 


23 
8 


28 








7 

1 


18 


15 22 

12 




c ^ 

O 


o 

>_ 


2? 






High: 


o 78 




8 




9 30 


Kiddle Group 


n 


12 


20 


26 








/ 










C 10 

C^ 
2 


21 

3 


25 
17 


2^ 
30 




Middle: 


o22 


20 


21 




Low Group 


°7 

f> 


10 


11 


16 


29 






! 

C 7 


11 


19 






5 














5 


10 


13 




C 3 


U 


*1 


6 




Low: 

TOTAL: 


o70 
o57 


C 3 


4- 


26 


o 



:hose clusters did not cross -validate : 
°2 3 1^ 17 29 

C' 16 28 
? 

C, . 23 24 
10 



a 0 Items in Italics cross-validated by group. 
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of this sort would be expected. More interesting is the distribution 

of the shifts between Group A and Group B clusters,, If the pattern 

for Group A is taken as a. reference and the order is considered to 

be fixed, a shift is considered to be + 1 if the mobile item in the 

Group A cluster is found in a Group B cluster associated with items 

one cluster higher for Group A, Thus, a + 1 shift occurred for item 13 

i 

in C_ clustered with item 5 in where the latter is in C^. Similarly, 
item 10 in would represent a - 1 shift 0 Table 22 summarizes these 
shi f ts o 



TABLE 22 

MOBILITY OP ITEMS BETWEEN GROUP A 
AND GROUP B IN TEEMS OP SHIFTS 



0 Shift 


t 1 


Shift 


t 2 Shifts 


- 3 Shifts 


Larger 


Shifts 


Items 


Items 


Items 


Items 


Items 


Shift 


1 7 


3 


19 


12 


16 


23 


-it 


2 9 


8 


20 


1^ 


28 


2k 


+if 


if 11 


10 


21 


25 


30 


26 


-5 


5 22 


13 


29 










6 27 


17 












10 




9 


3 


3 


3 
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Table 22 (see: p. 96) ohows thai the mean absolute shifting of 
items between the two groups was slightly more then one place, in the 
hierarchy (s - l»3l). This shifting was about one third of the shift 
expected if the items had randomly rearranged from Sroup A to Group B. 
If all possible shifts were equally probable, the mean shift would be 
3,2k. In fact 22 of the 28 items shifted two steps or less, which 
accounts for 79 per cent of the i terns 0 

A possible explanation for these shifts can be found in the 
hypothesized multiple interpretations. This hypothesis would suggest 
that in spite of the homogeneity of these two groups based upon total - 
correct scores, these two groups were obviously not homogeneous when it 
came to clustering of the items into item-homogeneous sub tests 0 The 
clustering was based upon correlations whi oh were sensitive to marginal 
totals. The stability of these marginal totals could be expected to be 
affected by the range of interpretations of the items within the groups 
concemedo This shifting could, therefore, be a. product of the 
heterogeneity of the examinees and the effectiveness of the item in 
communicating a limited range of possible interpretations,, 

Cross- v alld a t ion of wrong answers. An identical procedure to 
the one used for right answers was' used with the wrong answers. 
Table 22 (see: p. 98) gives the cross-validation between the advance 
foil classification and the individual foil clusters between Group A 
and Group B. 

In Table 22 we find that of the advance classification only 
three foils out of 16 (or 19 per cent) cross-validate as compared with 
the 16 out of 60 foils (or 26 per cent) which help to identify the 
clusters. Once again, the advance classification and. the clusters 
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cross-validate in about the same proportion. Wrong answers seem to 
cross-validate about as well as right answers. 

The third comparison, once again, was between foil clusters 
grouped into throe groups based upon the average foil ranko Table 23 
(see: p. 100) gives this comparison,, 

In Table 22 the high group again cross-validated best using the 
hierarchy. With \} being dropped as uninterpro table , 27 of the remain- 
ing 51 foils (or 53 per cent) cross-validated compared with 16 out of 
28 (or 57 per cen t) fo r right answers, and for a combined total of ^-3 
out of 79 alternatives (or 5^ per cent). 

Although the wrong answers showed a wider range of shifts, 30 
foils had a shift range of - 3 or loss (or 59 per cent) compared with a 
probable i = 4-.S3 for random shifts. Foils, though less stable, showed 
the same trend toward stability as right answers. Once again the mult- 
iple interpretation hypothesis could account for the lack of stability. 

The joint cross-validation combining right and wrong answers by 
group were: high group 21 out of 28 ( or 75 per cent); middle group 1^4- 
out of 28 (or 50 per cent); and low group 8 out of 23 (or 35 per cent) 0 
That is, the stability increases by about the same proportion (50 per 
cent) from low to high. This increase in stability was also found 
among the wrong answers for the Pro v erbs Te s t (Cf Powell, 1968). 

The Predic tion Value of the Experimental Test 

In addition to the scores and individual responses on the 
experimental test, two achievement scores were obtained for most 
examinees. There was some loss of data, making Group A have 125 members, 
and Group B have 120 members,. 

The first achievement test (Achievement Test i) was given with 



TABLE 23 

CROSS-VALIDATION BY GROUPING OF WRONG ANSWER CLUSTERS 
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the experimental test as a subtest for a midterm examination. The 
scores used for predictive purposes do not contain the experimental 
test scores. The second achievement test (Achievement Test II) was the 
final examination in the same course. The relationship among these 
tests is presented in Table 2k. 



TABLE 2k 

CORRELATIONS BETWEEN THE TESTS IK THIS STUDY 



Experimental Achievement Achievement 

Test Test I Test II 



Experimental 
Test 

Achievement 
Test I 

Achievement 
Test II 



1.000 



,22k 



1.000 



,klk 



loOOO 



As shown in Table 2k the two achievement tests were moderately 
correlated (r = .klk) . The relationship between the experimental and 
achievement tests was considerably less, particularly with respect to 
Achievement Test II „ 

In order to establish the predictive validity of the 
subdivisions of the experimental test, several comparisons were run 
using a step-wise multiple regression technique in all cases. Each 
achievement test was predicted separately for each of Group A and 
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Group B. The following predictions wore made. 

1. Total- correct score on experimental test predicting the 
total- correct scores of the achievement tests. 

2. Right-answer subtest scores on the experimental te'st with 
the subtests defined by the advance classification of items 
predicting the total-correct scores on the achievement tests. 

3. Combined right answer subtest scores and wrong ansv/er 
subtest scores with the subtests defined by the advance 
classification, predicting the total-correct scores in the 
achievement tests. 

'-I-. Scores on right-answer subtests defined by interpreted 
clusters used to predict the total- correct scores on the 
achievement tests. 

5. Scores on 'combined right-answer and wrong-answer subtests 
defined by all interpreted clusters used to predict the 
total- correct scores on the achievement tests. 

6. Scores on right-answer subtests defined by grouping right- 
answer clusters by the foil hierarchy used to predict the 
total- correct scores on the achievement tests. 

7. Combined right- and wrong-answer subtest scores defined by 
grouping of clusters on the basis of the foil hierarchy 
used to predict the total-correct scores on the achievement 
tests o 

In each case the total-correct score of each achievement test 
was being predicted „ 

Table 25 (see: p. 103) gives the results of these predictions 
for Group Ac 
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TABLE 25 

STEPWISE REGRESSION OF GROUP A DATA USING 
SEVERAL COMBINATIONS OP VARIABLES' 
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a. The combinations of variables are defined by number as given on 
page 102. 

b. Only those predictors which made a significant contribution Q> j_ »06) 
to the prediction were included cn this table. 



Table 25 gives the correlation between the total-ccrrect score 
on the experimental test, the multiple correlation achieved for signif- 
icant variables (p [_ .06). The squared multiple correlation (R ) are 
also given to indicate the amount of variance accounted for in the 
predictions . 

A similar set of data for Group B follows in Table 26 
(see: p. 104). 
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4b 


a. The variables used with 


Group B 


were 


identical in 


definition 


to 



those used in Group A. 
b. Only these predictors which made a significant contribution 
(p_ [_ „C6) to the prediction were included in this table. 



There are two considerations .relevant to Tables 25 and 26. 
First, the va.iue of the procedure is partly determined by the amount of 
variance accounted for (as given by R ). Using this criterion, there is 
a consistent improvement in prediction when the scores on wrong-answer 
subtests are included in the analysis. When this was done, the wrong- 
answer variables, in general, accounted for more of the variance than 
the right-answer variables within the same solution. The interpreted 
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clusters give a better solution than the advance classification. Both 
of these were better than the grouping by the foil hierarchy. The 
poorest predictor was the total-correct score on the experimental, test. 

The second consideration is the statistical significance of the 

9 

improvement of those values of R" when compared with each other. The 

formula for this comparison is: 

2 2 
F = ^1 ~ X N - v - 1 



1.00 - K " L 

where N is the number of persons. 

K is the number of independent predictions in P., 
and L is the number of independent predictions in R . 

This procedure gives the usual "F" test with degrees of freedom, 
N - K - 1 and K - L respectively. The results of these calculations 
derived from the data in Tables 25 and 26 (see: pp. 103 and 10'+) are 
given in Tables 2? to 30 which follow (see: pp. 106 to 109). 

Table 5-1 gives the significance of the difference between the 
predictions of Test I for Group A. It shows that the sequence 1 / 2 = 
3/4/5 stands clearly in the diagonal. One factor which may be 
involved in the equality 2=3 may be the fact that the number of 
predictors increases from 5 to l L i . If all other values remain the same, 
an increase in the size of the sample of less than 50 per cent would 
make the difference between combination 2 and 3 significants Also, no 
variables made a significant contribution in combination 6, and 2 
variables were significant in combination 7> which makes 7 a. signifi- 
cantly better predictor than 6. Thus, for Group A when predicting Test 
I there is a consistent tendency for the combined right and wrong 
answers to be better predictors than the right answers alone. 
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TABLE 27 
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Table 28 gives the significance of the differences between the 
predictions of Test II for Group A (see: p. 10?). 

There is no similar pattern when predictions are made to the 
future Test II as compared with the concurrent Test I. Some of the F 
values in the equivalent diagonal are large enough that a larger sample 
might make them significant. At least all predictor combinations are 
better than the total-correct score. Although the grouping of 
alternatives (combinations 6 and 7) on the basis of the hierarchy does 
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TABLE 2 8 

SIGNIFICANCE OF DIFFERENCES BETWEEN R 2 ! S 
FOR GROUP A WHEN PREDICTING TEST II 
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not yield significant differences, these two variables tend to have the 
largest F values. Once again, a larger sample size might have made the 
difference „ The cross-validation of the multiple regression coefficients 
which follows this section casts further light on this aspect of the 
problem o 

Table 29. (see: p. 108) gives the significance of the differences 
between the predictions of Test I for Group B. 
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TABLE 29 

SIGNIFICANCE OF DIFFERENCES BETWEEN R fa 
FOR GROUP P WHEN PREDICTING TEST I 





for 


variable 




R for Variable Combinations 8, 


C 0111 b 


inations 












1 


2 3 ^ 


7 








il b ! Z El I eS £ 


£ i E Ei I R 




1 




! ■ ! *c 


-- j empty — [ 
(cell 




2 


1.05 


~j [1.70 

t ! t 


-- J empty j 
[eel] ; 




3 


l.k2 


— ji.if3 --! ; 


| empty --Jl.Vj — 
■cell ! 




k 




! 11-53 -- { 


[empty _-J 
[cell 




5 


l.kZ 


— |1.^3 — [* — 11-53 


— } empty — --- 
[cell [ 


6 i i i i i i 




7 


1.13 




-- 1 empty - - 1 
'cell 

t i 




a. 


Definitions for 


these variable combinations ar« 


: given on page 102. 


b. 


Only the 


probabi 


lity (£ level) for significant 


differences are shown 



c. * means: Denominator 0, F value indeterminate 



Table 29 is very similar to Table 28 except that none of the 
predictor combinations are significantly better than any other including 
the total-correct scores. These findings are not surprising considering 
the low level of cross-validati.cn already found for all combinations 
except the grouping based upon the hierarchy. Although none of the 
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values for these groupings (Combinations 6 und 7) are significant, these 
two combinations have the highest p values. Once again a larger sample 
size might have made the difference. 

Table 30 gives the significance of differences between 
predictions of Test ]I for Group B. 



TABLE 30 

SIGNIFICANCE OF DIFFERENCES BETWEEN R 2, S 
FOB GROUP B WHEN PREDICTING TEST II 
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Table 30 chows Combination 7 to be a highly significantly better 

predictor in four of the six cases. When comparing Combination 3 with 7, 
2 

the I? is larger for Combination 7; but the number of significant 

predictors is larger for Combination 3 and, for this reason, gives a 

negative F value in the formula. Logically, Combination 7 would, 

therefore, be the better predictor in this case as well. Combination 5 

2 

has a somewhat larger R but requires many more predictor variables to 
achieve this, making the difference clearly insignificant. 

Three trends seem to emerge from this data. First, combined 
right and wrong answers generally seem to yield the best predictions. 

Second, the best prediction of a concurrent test seems to be 
found by using the interpreted clusters for the same group,, 

Third, when predicting remote events or the results of another 
group combining the predictor variables on the basis of the hierarchy 
would seem to give the best predictions. 

Cross-vali dati on of the multi pie ro^rsssi on coefficients . It is 
possible to cross-validate multiple correlations by finding the vector 
product of the validity coefficients for one group and the standardized 
regression weights for the same variables from the other group as 
follows: Fi = V'W 

where V' is a row vector of the validity coefficients for 
one group , 

W is- a column vector of the standardized regression for the 
corresponding variables from the other group, 

2 

and R is the resulting vector product. 
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Since the combining of right arid wrong answers seemed ic give 
the "best results, only these combinations of variables were cross- 
validated in this manner. Table 31 (see: p. 112) gives the results of 
these cal culat i ons . 

To begin with, Table Jl shows that Combination 3, the advance 
classification, does not survive cross-validation. Clearly, the best 
predictor of the concurrent test for Group A was the interpreted 
clusters (Combination 5), a finding consistent with the findings for the 
significance of differences. This combination (5) of variables did not 
cross-validate in any other situations. 

For the remote test, Combination 7 proved to cross-validate very- 
well, much better than the significance of the differences would suggest. 
For Group B, Combination 7 cross-validated about as well as the propor- 
tion of itera-f or-item cross-validation would suggest that it should. 
These findings also tend to support the suggestion that grouping on the 
basis of the hierarchy may be the best method for predicting future 
performance or performance in another group. 

The reader is cautioned that the correlations between the 
total-correct scores of the experimental test and the achievement tests 
suggest that. these tests may be dissimilar in the characteristics they 
are measuring. This situation would be expected to produce lower 
multiple correlations than tests of greater similarity might achieve. 
Second, near zero multiple correlations aro easier to cross-validate 
than higher ones, so that the relative stability of these correlations 
can only supply suggestive results by themselves 0 



TABLE 31 

CROSS-VALIDATION OF MULTIPLE REGRESSION COEFFICIENTS 
FOR COMBINED ANSWERS 



Group A Test I 
Test II 

Group B Test I 
Test II 




Combination Number 
5 

Original Cross-validation 
R R 



,278 
,156 



.189 

o237 



c032 

.038 
.013 



Origmaj 

o 

R 



2 



R 



.055 
,085 

.057 
.193 



.CA-5 

,098 

.045 
.100 
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Summary o 1' Chapt er ',' 

Briefly, the findings as reported in this chapter were as 

follows : 

lo Interpoint distance gave the best statistical solution to 
the data "being considered in this study. 

2o Logi eo- semantic analysis provided reasonable support to the 
construct validity of the procerhire used, provided that 
alternative classifications are permissible, and inter- 
pretations were confined to the group under study. 

3. There may be a hierarchy of foils which parallels Bloom's 

Taxonomy which may influence the way in which items perform. 

k. None of the- cross-validations were very strong, with the 

hierarchy tending tc be somewhat better supported than other- 
aspects of the analysis. 

5« Wrong answers , in general, tended to add significantly to 
the predictions of both concurrent and future achievement 
whenever significance was found. 

o, Within one group the interpreted clusteis gave the best 

concurrent prediction, otherwise the grouping on the basis 
of hierarchy seemed to produce the best prediction,, 



CHAFTEH VI 

CONCLUSIONS AMD IMPLICATIONS 

The conclusions which can be drawn from this study are discussed, 
in three sections-, first, the conclusions which are relevant to the 
experimental test used, in this study, second, the conclusions which are 
relevant to the analytical procedure, and finally, those which are 
rol cvn'i^ *to th 0 svst^m^ ti c response r^st^n aire c 

The implications which can he drawn from this study are discussed 
in four sections „ First, there are the implications of the results of 
the use of this analytic procedure to the theory of test analysis 
procedures o Second, there are the implications to the design, construc- 
tion, and interpretation of taxonomic tests » Third, there are a number 
of implications of this study to educational practice. Finally, this 
study has opened enough possibilities for future research that these 
are discussed in a separate section „ 

Conclusions Related to the Experimental Test 

Superficially, the experimental test used in this study would 
seem to have been a weak instrument but, as will be seen, the criteria 
usually used for evaluation may not have been applicable to this teste 
Using the usual criteria, for instance, the selection ratio for the 
right answers on most of the items was low; the biserial correlations 
of the items to the total correct scores were also low relative to the 
size of the corresponding difficulty ratios; the internal consistency 
based upon the Kuder-Richardson procedure was low, and the interrater 
reliabilities left a great deal to be desired 0 

On the other hand, the usual criteria employed for the 
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evaluating of teats may not he appropriate for this one s Briefly, the 
desirability of middle difficulty items is a criterion based upon the 
assumption that this level of difficulty maximises the discrimination 
of the test when all the items are dichotomous variables „ When all 
alternatives are being considered rather than when the item is being 
considered either right or wrong, this criterion seems no longer to 
apply o 

The biserial correlation of the test items taken against the 
total correct scores is a criterion of the discriminating power of 
these items assuming that the test as a whole is highly homogeneous „ 
The low internal consistency of this test suggests that the test was 
not homogeneous „ Evidence for the lack of homogeneity in this test can 
be found in the logic of the construct model (Bloom's T axonomy ) , and in 
the fact that the test subdivides in the cluster analysis into ten 
clusters of right answers, at least si.x of which were nearly orthogonal. 
Also, the loadings of the items on these nearly orthogonal factors were 
very nearly the original lengths of the vectors in the principal axis 
matrix from which they were derived,, This latter result suggests that 
the internal consistency within clusters was substantially higher than 
within the test as a whole » For these reasons, the usual criteria may 
not apply to this test. 

On the positive side, the average total correct score for 
persons selecting each alternative was higher for the right alternative 
than for all others used by at least 1 0 5 points in 37 out of 58 cases 
(or 6k per cent) which gives some support to the strength of the test 
(see : Table kO , p. 150), Although the clusters did not cross-validate 
very well, when the shifting of items into other clusters was considered, 
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it became evident that the items and foils were much more stable than 
would be suggested by chance alone , Also, the clear evidence for an 
interacting hierarchy of items and foils would seem to provide strong 
support for the possibility that the instrument was measuring some 
systematic characteristics of the examinees,, 

Precisely what these characteristics were would seem to he more 
ambiguous than the probability that they were being measured. They were, 
however, clearly process-oriented characteristics 0 The doubt about 
precise definitions arose from two sources, l) the lower than desirable 
interrater reliabilities, and 2) the lower than desirable cross- 
validation of the clusters. Both these weaknesses in the interpret ability 
of the results, however, may be a property of this type of test, and not 
a criticism of it„ 

If the proportion of (lie total variance used in the factor 
solution is considered., there was 37°7 per cent accounted for by the six 
factors for the items and another k8.6 per cent for the 15 factors used 
for the wrong answers, suggesting a higher internal consistency than the 
Kuder-Richardson results suggested,, 

Thus, the instrument displays the following properties: 
1. Significantly improved prediction of independent concurrent 
achievement scores for the same group of examinees using 
interpreted clusters from both right and wrong answer clus- 
ters combined over the other combinations of scores tested,, 
2 0 A clear hierarchical pattern of both items and foils,, The 
right answers and the foil selections seem to interact and 
to be relatively stable within the hierarchy under cross- 
validation when the range of shifts are considered,, 
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3c An overall discrimination apparently "based upon process- 
or! ented events rather than content-oriented events , 

The construct objective was to produce a process-oriented 
taxonomic test which had predictive value relevant to achievement 
variables. Whatever criticisms might be made of the test, the results 
made it clear that it met this construct objective to a reasonable 
degree, and for this reason be taken to be a valid test. ■ Also, the - 
indirect evidence suggested that the test was probably more reliable 
than was suggested by the direct evidence 0 Precisely which procedures 
should be used to establish validity and reliability estimates for tests 
of this type are not yet clear. 

Conclusions Related to the Analytic Procedures Used 

The analysis began with phi coefficients „ The use of these 
coefficients can be defended on the grounds that none of the assumptions 
which are made for these coefficients was violated by their use. The 
two variables being related for each coefficient are discrete since each 
represents the selections made by the members of the same group for dif- 
ferent alternatives o They were diohotomous because an accept-re ject 
decision applies to all alternatives as a requirement of the response 
procedure.. Linear dependencies were removed from the data by parti- 
tioning the matrix o Finally, since all values were expressed as 
frequencies of occurrence, the categories were amenable to appropriate 
representation by two-point values „ 

The resulting large matrices of phi coefficients were simplified 
by principal axis factor analysis in order to remove as much measurement 
error as possible, and to maximize the variance accounted for by any 
particular number of dimensions in the space being vised. Beyond this 
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point the procedures seemed to separate in bo two aspects, those which 
are commonly used for the study of tests and their results, and those 
which are not commonly used hut which are specifically selected to meet 
problems which may arise in the interpretation of the results. 

The result of this study derived from the commonly employed 
procedures was uniformly ambiguous, inconclusive, or negative, whereas 
the less common procedures uniformly produced significant results » As 
indicated, the test itself would seem to have been of questionable 
value if it were evaluated, by the commonly used procedures, and yet it 
clearly met the construct properties it was designed to meet. 

The Procrustes rotations of the factor matrix to fit either 
content or process in the advance classification produced negative 
results,, The results of the usual .analytic rotations on the factor 
matrices were not reported in Chapter V because they were as unintcr- 
pretable as the Procrustes rotations,, However, when interpoint distance 
clusters were used to avoid the problem of rotation, a set of nearly 
orthogonal groupings of the variables was produced. Unquestionably, the 
cluster approach produced a statistically satisfactory representation of 
the data leaving the researcher with the problem of interpretation to be 
resolved by non-statistical methods. 

Cross-validation cf clusters was equally disappointing, and 
contradicted the results of the t-test for uncorrelated samples based 
upon total correct scores „ Substantial improvements in the proportion 
of cross-validation were found when the apparent hierarchy of items and 
foils was taken into account,, Additional support for the cross- 
validation was found in the pattern of shifts of alternatives which 
occurred among the clusters between the two groups „ The data were much 
more systematic among the clusters between the two groups „ The data 



more systematic than the usual procedures seemed tc suggest. 

Using total -correct scores to establish a hierarchy of foils on 
"both a within item and a between item basis produced results with a 
moderate relationship,, When these two procedures were used to order 
the right answer clusters, however, a highly significant similarity 
between these latter two orde rings was found, leaving little doubt that 
an interactive ordering between the right and wrong answers was a 
systematic characteristic of the data* This ordering was unrelated to 
the total correct averages for the right answer clusters and to item 
difficulty, making the use of total -correct scores for the establishment 
of the hierarchy questionable „ However, the shift patterns of the cross- 
validation suggested a much more stable .result than did the cross- 
validation alone, suggesting that these shift patterns might be used to 
determine the orderiirg of the clusters instead of using total-correct 
scores . 

Although all predictions were low, the use of the interpreted 
clusters did tend to give significantly better concurrent prediction of 
the total -correct scores or: the independent concurrent achievement 
measure for the group on whom the interpretation was attempted. The 
amount of variance accounted for increased roughly three times in this 
case. The question of the validity of the use of the total-correct 
scores as adequate representations of achievement on the achievement 
measures was not explored. Finally, the hierarchy also proved to be 
more broadly stable during cross-validation than other variables,. 

Evident].;/, the more conclusive results were found by the less 
common procedures. Since these procedures were designed to fit the 
specific problems raised in this study, and since the statistical 
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adequacy of these procedures proved to be beyond question } they would 
seem to "be, collectively, a more adequate method of approaching the kind 
of data this study produced than the more common procedures would seem 
to be 0 

Conclusions Related to the Systemati c Response Hypothesis 

The adequacy of the experimental test, and the analytic 
procedure b ajud lor the purpose of this studj seem to he established to 
a reasonably acceptable level. Taken alone, the data were sufficiently 
systematic in both right and wrong answer matrices to establish that 
"most if not all of the answers given to multiple choice achievement 
tests are selected upon a systematic basis" may be a reasonable approach 
to human performance „ The tv/o findings most relevant were the presence 
of the interactive hierarchy and the increase in predictive validity 
evident when wrong .answer clusters were included in the regression 
equations „ 

If the evidence supporting the multiple interpretation hypothesis 
is included, the support for this psychological postulate is greatly 
increased,, To begin with the negative results from the Procrustes 
rotations for the advance classification by both content and process 
established the inadequacy of this approach o Lo^i co-semantic analysis 
clearly established that process variables provided the best interpreta- 
tion of the clusters. But the failure of the Procrustes rotation with 
the advance classification, the low cross-validation levels and the 
shift patterns clearly indicated that the classifications are not 
mutually exclusive „ 

Kropp et al (1966) found that the same subtests were differently 
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defined "by the ;<"it, for different grade levels, and Powell (1968) 
reported only about 60 per cent cross-validation by reported reasons 
for the selection of particular wrong answers. Also, the higher level 
alternatives tend to be more stable than the lower ones when taken in 
combination, which is consistent with other findings (see: Powell, 1968). 
Thus, independent studies also report findings suggesting that the 
classifications of answers may not be mutually exclusive „ A reasonable 
synthesis of these findings would be to suggest that each item may be 
interpreted in a variety of different ways. That is, the poor showing 
on cross-validation and the lower than desirable inter rater reliabilities 
may be a product of multiple interpretations of the items „ Strong 
support for this hypothesis was found in this study in the systematic 
character of many of the shifts which occurred among the alternatives 0 
These shifts were sufficiently small in range for most items that the 
possibility of their use in the establishment of order among the 
variables could be proposed (see: p 0 97). Further support was found 
in the fact that the prediction of the concurrent test on the inter- 
preted group was the only case where the interpreted clusters had a 
distinct advantage a Predictions based upon the hierarchy, on the other 
hand, seemed to be less powerful, but more stable over a broader range 
of time and population c These findings support the multiple interpreta- 
tion hypothesis as well because they suggest the short range 
applicability of specific interpretations,, 

Perhaps the range of this applicability of interpreted findings 
could be increased if the heterogeneity of the examinees upon whom the 
interpretations are made were reduced,, 

In summary, the conclusions of this study were: 
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1. Human performance, when abstracted from responses to 

multip'Je choice achievement tests involving higher mental 
processes, would seem to be systematic, and to display 
evidence of multiple interpretation of the communication „ 

2 0 There would seem to be a hierarchy of foils which parallels 
the hierarchy of light answers and which influences the way 
in which each total item performs 0 The levels of the foils 
themselves seem to depend upon the systematic ways in which 
this totality of each item is approached. 

3. Wrong answers contain potentially useful information with 
respect to achievement when higher mental processes are 
involved. 

Before the implications of this study are discussed, a statement 
should be made concerning the limitations to generalizabil ity apparent 
in this study. 

Limitations to General i sr.abi 11 t y 

There are several restrictions to the generalizability of the 
findings of this study which can be derived from the nature of the study 
and its conclusions 0 First, the findings of this study do not apply to 
multiple choice achievement tests where the simple recall of informa- 
tion is the only characteristic being measured,, The experimental test 
was process rather than content oriented and Knowledge level items 
were considered inappropriate to its format, hence this limitation 0 

Second, the findings of this study do not apply where the cost 
of the additional effort required to obtain and interpret categorical 
information upon the examinees is greater than the cost of information 
loss, and possible misclassifi cation attendant thereto, by using the 
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much simpler total -correct score method of evaluation* 

Third, these findings may not apply when the most competent are 
being screened from already competent individuals for some specific 
purpose „ A stronger statement in this respect cannot he made because 
later research may show that wrong answers may supply valid information 
for the purpose in question. For instance, Irr (irrelevancy) foils may 
identify the most creative individuals among the high performers » 

Finally, if a researcher has a valid reason to evaluate the 
effectiveness of a single treatment given to a heterogeneous group by 
using a. single ordinal dimension for the particular group in question, 
the findings of this study clearly do not apply <> 

Implications of This Study to the Theory of Tost Analysis 

There are several situations where the findings of this study 

are very important to test theory,, The fact that the more common 

procedures tended to give ambiguous, inconclusive or negative results 

raises a number of pertinent questions. 

Classical test theory begins with the assumption that 

X. = T. + E 
— i i 

th 

where X_. is observed score of the i individual, 

T. is his true score, and 
i 

E_. is the measurement error 

However, for multiple choice achievement tests, this observed 

score (X ) itself is usually a composite entity obtained from the 

summation of single events as follows: 
n 
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where x_ is binary, being 1 if the ith 

individual answered the item correctly, other- 
wise 0, and 

n is the number of items in the test. 

The issue then becomes -- is X. sufficiently homogeneous for 
all individuals to justify the use of classical test theory? If it is 
not, as was evident in the present study, then an alternative approach 
to the data would seem to be needed, since the more common approaches 
proved unsatisfactory in this study,, 

Within the context of the present study, several considerations 
must be met by this alternative approach,, The phi coefficients used 
are extremely sensitive to the magnitudes of marginal proportions , For 
this reason, if particular alternatives are selected for a different 
range of reasons among two samples of individuals, it would be expected 
that these alternatives would migrate to ne\, r clusters for reasons of 
systematic differences between groups rather than for reasons of 
measurement error,, Also, if the range of reasons within a group of 
examinees were too broad, the interpretation of clusters would be ex- 
pected to be difficult and possibly not applicable to specific individ- 
uals,, That is, the first assumption made would be that different 
reasons for the selection of a particular alternative would be reflected 
by differences among overall patterns 0 These arguments suggest the need 
at the outset for a homogeneous group of examinees „ The key to 
homogeneity in this study would seem to be the shifts in category which 
occurred upon cross-validation,, Perhaps an homogeneous group should be 
defined in terms of minimizing the shifts which occur in the clustering 
of the alternatives for any, or all possible random assignments of the 
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group members to an arbitrary number of groups* In short, the 
procedure should probably begin by selecting groups of individuals with 
maximum cross-validity within these groups „ 

The clusters for such groups will be as stable as possible on 
the basis of the determindition of their composition* Hence, the 
possibility of interpreting the resulting clusters should be optimized, 
as should the applicability of these interpretations to the individuals 
within the groups 0 

With the clusters thus stabilised it should, be possible to 
determine the clusters using all the data rather than a simplification 
of it, since the surface-to-surface interpoint distance (d) between the 
ends of the vectors within the hype raphe re can be determined quite 
simply by assuming that phi (ft) is the cosine of the angle between the 
arms of the isosceles triangle produced by the vector pairs,, In this 
case the distance (d) is 

d =1/2(1 - ft) 

A tighter level of homogeneity is possible among individuals 
who have essentially the same response patterns but these individuals 
would tend to have only one meaningful cluster composed of all or most 
of their responses thus rendering interpretation impossible. 

The dimensionality of the data from a homogeneous (by cross- 
validation) group of individuals would be determined by the minimum 
number of homogeneous clusters which can be extracted before orthogonal- 
ity between the centriods of the clusters begins to disappear. Thus, 
the proposed procedure as just outlined as related to the problems 
raised by the data in this study provides a unique solution once the 
stop criteria, are established,, Each cluster would be categorical 



126 

(in the sense given on page k?) t and optimally in tsrpre table assuming 
that differences in interpretation of the communication by the examinee 
are characterized by differences in selection pattern. 

There may, as the evidence from this study suggests, be an order 
among the categories which can be determined by the shifts which occur 
during cross-validation. Poor items would be unstable for the eross- 
val idatioiio 

So far as the categories themselves are concerned, these would 
be expected to be of two types l) nominal, and 2) ordinal „ Nominal 
categories would be expected to be bimodai with the modes tending to 
polarize at the extremities of the potential range within the category„ 
Ordinal categories would be expected to display scalability 
characteristics within their' potential range,, 

Relationships among categories other than the ordinal 
(hierarchical) one could probably be determinable by the relationships 
among the centroids of the categories. For instance, Powell and 
Isbister (1969) found a polarity between Invalid Assumptions for the 
wrong answers and Synthesis items among the right answers 0 In such a 
case it might be unnecessary to partition the matrix to remove linear 
dependencies. If this latter facilitation could be provided by this 
procedure, a homogeneous test in the classical sense would be one in 
which the right answers formed a single cluster of the ordinal type. 
Thus, the proposed procedure just given would seem to contain the 
characteristics of classical test theory as a special case„ 

It would be reasonable, then, to argue that the findings of 
this study suggest the need for alternative procedures to the ones in 
common use for the analysis of data from multiple choice tests, and the 



findings suggest a particular procedure which contains the commonly used 
procedures as a special case „ 

Implications of 'Phis Study Concerning Taxonomic Tests 

Bloom (1956) defines his Taxonomy as having three aspects: 
1. It is a classification system «, 

2c It is hierarchically ordered on a "complexity" dimension. 
3o Each higher level is i'ojuied by combinations of the lower 
level s 0 

The two properties of classification and ordering among classes 
combine to distinguish a taxonomy from other classification systems 0 
Thus the evidence from ihis study supports the description of both 
Bloom's T axono my and the Guidelines as taxonomies „ rioting the evident 
interrelatedness of these two taxonomies in this study suggests that 
they may both be part of a single taxonomy. 

Concerning the "complexity" dimension. Bloom said. "Oar attempt 
to arrange educational behavior from simple to complex was based upon 
the idea that a particular simple behavior may become integrated with 
other equally simple behaviors to form a more complex behavior /page IS/." 

The findings of this study which produced no relationship between 
total -correct scores and the hierarchy where right answers were 
concerned, and no relationship between average item difficulty within 
clusters and the hierarchy did not help to identify the meaning of the 
term "complexity „ " Since the hierarchy could apparently have been 
produced through cross-validation procedures without recourse to total- 
correct scores, the meaning of "complexity" becomes even more vague „ 
However, the finding that "higher" level members of the taxonomy tend 
to be more stable than lower levels suggests that these categories may 
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be the product of "more powerful strategies." 

The third aspect of Bloom's Taxonomy as noted previously 
(see: p. 127) also deserves attention. This aspect of his definition 
would seem to arise as an hypothesis from his definition, the 
"complexity" dimension. About this subsumptive property of the 
Taxonomy Bloom himself said that the evidence he could collect to 
support this property was inconclusive (Bloom: p. 19)° 

Other evidence concerning this subsumptive property is meagre. 
Kropp et al (1966) did not find the clear reproduction of the pattern 
which they expected to find in the factor analysis of their tests 
^/Kropp: p„ 91 ff/. Also, their Simplex analysis did not produce the 
consistent order that this property would predict (see: pp. 2^-2.5) 0 

Powell and Isbister (1969) found that a promax rotation did not 
improve the resolution of factors between subtest scores based upon the 
advance classification for essentially the same test as used in this 
study. The subtests as defined in this study by cluster analysis 
rather than by advance classification showed this same tendency to 
orthogonality. It is premature to he dogmatic, but it is possible that 
this subsumptive property may be a hypothesis which will be refuted by 
the evidence. Alternative analytical procedures such as the one 
outlined above may be needed to settle this issue conclusively. 

There are alternative theoretical positions which would predict 
the possibility that strategic categories may be discrete and hierar- 
chically ordered by "power" rather than subsumptive. Piaget /1963, 
p. 1-3 ffy j for instance, has suggested that development may involve 
shifts in the schema in which case development may be expected to 

7 

Alternatively, "the acquisition of new strategies," 
/see: Powell, 1967, p. 286 ff/. 



proceed in a series of discrete phases and stages, each of which would 
be expected to have its own distinctive properties,. Such evidence as 
is available, in particular the difficulty in determining a data-based 
definition of "complexity" as just discussed; the apparent tendency for 
"higher" clusters to he more stable than "lower" clusters; and the 
broader cross-validation support for the hierarchy than for specific 
interpretations add suggestive support to the latter alternative over 
the forme r D 

Thus, the advent .of taxonomic achievement tests raise some 
issues in connection with the analytic procedures and the interpretive 
procedures used for these tests,, Whatever else, the results of this 
study have clearly shown that these tests produce a genuine taxonomy 
which might he improved by the systematic development of foils, and the 
use of the responses to these foils as information when evaluating 
and interpreting these tests and when evaluating, interpreting, and 
predicting the performance of the individuals taking them. 

Implications of This Study to Educational Practice 

The findings of this stud..:' suggest that tests which are clearly 
homogeneous regarding internal consistency may form a special case of a 
broader class of tests which have taxonomic properties,, This conclusion 
has broad implications with respect to their use in the educational 
settingo 

To begin with, the use of the Guidelines would seem to have 
several practical advantages. First, they simplify item writing because 
they provide a systematic basis for writing a broader range of foils 
than can be made without them 0 Second, the Guidelines improve the basis 
for the reasons why a foil is wrong 0 Third, as research further extends 
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the range of Guideline categories and refines their definition, it nay 
be possible to inci-er.se the precision with which euoh concepts as 
analysis may "be defined, further improving the construct validity of 
process-oriented taxonoraic tests. 

Another advantage may arise from the extension of the Guidelines 
into the Misreading and Misclassification types of foilo Such an 
extension may link what is now known about diagnostic characteristics 
of tests in the areas of content-related performance and skill -related 
performance. This linkage may make it possible to extend the diagnos- 
tic aspect of testing beyond the knowledge and comprehension 
characteristics of of reading and arithmetic into the more abstract 
characteristics of mathematics and into the more esoteric subjects such 
as social science, and perhaps oven literary appreciation where the 
subject matter is clearly open to multiple interpretations „ 

If diagnostic testing can be coup] ed through research with 
improved definitions of educational objectives and the factors involved 
in their attainment-, teaching 1 could be more nearly like the practice of 
medicine. In medicine the practioner classifies a set of characteris- 
tics (a syndrome) and uses his knowledge of the effective treatments 
available to remedy the condition,, He then monitors the progress of 
the treatment o If all goes normally, the condition is corrected,, If 
not, the practioner' modifies treatment (prognosis) and/or orders 
further tests to modify the classification (diagnosis) of the condition, 
and if necessary, calls in specialists to extend his knowledge 
resources, moves the patient to the hospital to extend his physical 
resources, etc 0 

There are, of course, dissimilarities between medical and 
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educational practice. Medical men. deal general 'J y with short term 
problems of a clinical dvs functional nature. The treatment range at 
their disposal tends to be drastic and when appropriate dramatic in 
its effectiveness. Educators, on the other hand, generally deal with 
long term developmental situations. The procedures available are less 
spectacular, slower acting, and much more complex. However, learning 
research is now providing Increasingly powerful tools for the educator. 
Among these are CHI (Computer Monitored Instruction), and CAI 
(Computer Assisted Instruction). These two procedures alone, along 
with the meaningful interpretation of right and wrong answers in the 
terms just indicated might greatly extend the capabilities of education. 

The essential problem with the bright picture just painted is 
that at the moment, it contains too many unanswered "ifs." The next 
section spells out some of the research which might be conducted to 
help to make this dream become a reality. 

Furthe r Research Suggested by the Findings of This Study 

There are several areas for further research suggested by this 
study. One of these involves the host of problems which an extension 
of test theory could generate. The solution to the "multiple 
interpretation" problem presented in this study, although suggested and 
strongly supported by the findings of this study, is probably only one 
of a range of possible solutions some of which may be more practical 
than others. Those Individuals interested in mathematical statistics 
could pursue many avenues from this single problem. At a more practical 
level, there are a host of numerical analyses problems in the 
implementation of the particular procedure proposed in this study. 
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Subsequent to the effective implication of an effective 
analytical procedure, there is the possibility of a host of studies 
into the characteristics of specific tests and classes of test, into 
the conditions under which nominal and ordinal categories form, into 
the types of relationship among categories which are normally found 
and the condition? under which these relationships occur. Second, 
order factoring of the centroid matrices seems a logical first step 
hut perhaps the entire structure could he integrated into a single 
analytic procedure and a single model 0 

Another essential area for research involves the formulation 
and resolution of problems arising from the interpretation of clusters 
after their statistical characteristics have been determined. Attend- 
ant to this problem is the cross-validation of interpretation to 
independent samples of equivalent or nearly equivalent profile 
characteristics. A profile in this context is a. set of clusters and 
its attendant statistical and logi co-semantic characteristics. 

The formation of a generic model for a range of type of test 
opens the possibility for the computer generation of a test of partic- 
ular construct characteristics derived from the past performance 
characteristi.es of a large number of items in a pool of items. With an 
even larger pool of items, the computer could generate and administer 
a branching type programme tailored to a v/ide range of individual 
differences with the aid of re searchable adjustments to the test 
construction model. In this latter case, the computer could update its 
performance statistics on the item pool as students take the course, 
and in so doing refine its own course. 

Another area for research could be the reworking of many of the 



studies related to educational methodology, evaluating the methods with 
the profile analysis procedure suggested here,, Problems of matching 
teacher, method, student, and programme could be opened to detailed 
research, paving the way to the much broader use of diagnostic- 
prognostic practices in education than their present use. Studies into 
the relationships between achievement, personality, intellective and 
perhaps even genetic disposition variables whouid also seem reasonably 
possible from these small beginnings,, 

Another area for research could be the precise definition of 
the developmental sequences through which children pass, the optimum 
ways of modifying these sequences toward specific goals and the degree 
to which these sequences can be modified. The charting of developmental 
patterns might lead to earlier and more precise identifications of 
specific talentSo Also, the extension of the Guidelines to include the 
full range of academic performance might help to answer questions about 
the relative importance of content and of process in particular subject 
areas and for specific stages of developments 

Finally, there is the psychological question as to whether 
intellectual development is continuous and' cumulative, or discrete and 
taxonomic, or some combination of these two„ In this latter case which 
aspects of intellectual development are continuous and which are 
discrete and how do they interact? Can critical phases and critical 
experiences be identified and matched so as to extend human 
capabilities? 

This list is not exhaustive,, It is left to the reader to 
extend it himself in keeping with his own special interests. 
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C Q , containing items 15 and 18, is omitted front tnis table . 
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TABLE hi 

METHOD OF CALCULATING INTER-POINT DISTANCE CLUSTERS 
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D was minimized in this procedure. 
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APPENDIX. B 

THE ADVANCE CLASSIFICATION OF THE ALTERNATIVES 
IN THE EXPERIMENTAL TEST 
The discussion which fellows presents a detailed item-by- item 

account of the procedure used in the construction of this experimental 

teste The format of this discussion involves: 

lo Giving the reading selection as it is required,, 

2. Giving each item in its entirety in the format it was given 
to the examinees; except that in this discussion the cat- 
egories of the items and the foils are indicated for the 
convenience of the reader. 

3c- The reason for classifying the item by Bloom's Taxonom y, 
and the foils the Guidelines as indicated, are given 
following each item. 

k. Items 19 to 2k inclusive form a special case and will, 
therefore, be dealt with as a unit. 

5. In the classification of foils no one item contained, by 
arbitrary practice, two foils from the same category. 



THE EXPERIMENTA L TEST 

Dire ctions for Exa minees : 

Answer all questions in Part- I on the basis of the reading 

selections given. 

First Reading Selection 

Source : Dexter, Lewis Anthony; The Tyranny of Schooling , 
N.Y., Basic Books, p. 1. 



Most people in our society at one time or another suffer 
humiliation, shame, or at least severe apprehension because of 
one great fear: they are afraid that other people may think 
that they are stupid. This fear of being regarded as stupid 
frequently underlies inferiority complexes, self-contempt, 
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self -depreeJ ati on , and despair. 

Our society teaches contempt for stupidity and fear of being 
regarded as stupid through one central institution and its 
auxiliaries. This institution is compulsory schooling. It is 
aided by such auxiliary practices as compulsory written 
examinations for admission to many jobs, intelligence testing, 
and the like. 

1. From the above article we may conclude that if society does 
not reduce its contempt for stupidity: 
( Bloom ' s Category if. 20) 

A. Emotional problems will continue to be on the increase. 
(OG) 

*B. The development of creativity will continue to be 
restricted. 

Co Mutual co-operation will continue to be difficult to 
obtain. (i-A) 

I). Economic power will continue to be confined to a 
minority group. (irr) 

This item was classified as an analysis (4.20) item en the 
grounds that it requires the examinee to display "skills in comprehend- 
ing the interrelationship among ideas." (Bloom: p. 206). The examinee 
is expected to realize that, contrary to popular myth, creative people 
do not display "inferiority complexes, self-contempt, self-depreciation, 
and despair" to tne same extent as is found in the population at large. 
For this reason, the development of creativity and the development of 
contempt for stupidity would be expected to be inversely related. 
Since, in the stem, the variable "contempt for stupidity" does not 
change, it follows logically that the status of any related variable 
should show no change. If the examinee did net know this relationship, 
he should have been able to arrive at it from the logic of the foils. 

Foil 1A for ID , on the basis of the symbolism used in 



Chapter V) surest a an increase .in one variable without a correspond ing 
increase in the other. The phrase "dees not reduce" in the stem, dees 
not validly warrant the conclusion that contempt for stupidity will 
increase. It is adding incorrect information to the answer to suggest 
a change in one variable without an explicit statement concerning change 
in the appropriate direction in the other variable. Hence the foil is 
classified as an Over Generalization (0G-). 

Foil 1C (or 1D_) assumes a functional link between co-operative- 
ness and contempt for stupidity. Unlike the case for creativity, there 
is no valid reason to assume such a relationship for co-operativeness . 
Hence this foil involves an Invalid Assumption (lA). 

Foil 113 (or ID ) assumes a functional relationship between 

contempt for stupidity and the confining of economic power to a minority 

group. For this reason this foil could have been an Invalid Assumption 

(lA) except for the arbitrary rule used for classifying foils which 

allows only one foil in a category per item. On the other hand, the 

concentration of economic power for the purpose of maintaining economic 

institutions is a practical necessity independent of how the society 

treats the individual or how the decision makers are chosen so that this 

statement is true but irrelevant to the problem. Hence this foil wa.s 

classified as an Irrelevancy (irr). 

2. Which of the following is the most important causitive 

factor of contempt for stupidity? (Bloom's Category 4.10) 



'In this code used in Chapter V the subscript stands for the 
first distractor (foil) in item 1. The code gives a standard 
procedure for identifying i'oils without concern for which 
alternative is the right answer. 
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*Ao Compulsory categorizing in school, 



B. Compulsory school a. t tendance . 



(sub) 



C. Compulsory written examinations. 



(OS) 



D. Compulsory intelligence testing. 



(irr) 



Item 2 is asking for the "causative factor" which is not stated 
in the selection. This item, therefore, requires the examinee to 
demonstrate his "ability to recognize unstated assumptions" (Bloom: 
1956, p. 205), hence the classification of this item as analysis ('+.1Q)« 

Compulsory school attendance is an enabling factor in this 
situation, but it is neither necessory nor sufficient. in fact, compul- 
sory school attendance is disjunctively related to the development of 
contempt for stupidity which can develop in universities where attend- 
ance is not compulsory. Also, contempt for stupidity need not develop 
in a compulsory school system. Any attempts at the instutionalization 0 
an individual can lead to the development, on the part of the individual, 
of contempt for the forms of behavior considered "stupid" by that 
institution. (it can also have the opposite effect). In any event, the 
replacement of this term "any" by the term "compulsory" makes this a 
Substitution (Sub) foil. 

With respect to foil 2C (2D ) the cause of contempt for 
stupidity is the "pass-fail syndrome" which compulsorily classifies a 
certain proportion of the population as "stupid." In this case, it is 
not the written examinations but the use to which they are pat which 
leads to contempt for stupidity. Furthermore, contempt for stupidity 
can (and dees) develop in the classroom context at times other than 
during examination writing by the tacit acceptance by a student's peers 
of the assumption, made by the teacher that making a mistake is "sinful." 
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There would be no need for compulsory examinations if there were no 

attempt to make olassif i cations. However, written examinations by 

themselves do not cause contempt for stupidity, hence this foil is an 

Oversimplification (OS) . 

In the case of compulsory intelligence testing (Foil 2D or 2D^) 

the issue is whether or not the results of these tests are used as part 

of the compulsory classification system rather than whether or not the 

tests are given. Therefore, this foil as stated is an Irrelevancy (irr 

3. The school acts as an agent for the continuance of contempt 
for stupidity by: (Bloom's Category 2.10) 

A» Placing too much emphasis on success in extra- curricula 
activities., (Sub) 

*B. Reflecting the attitude that personal worth is at stake 

C. Encouraging competition between students of unequal 
ability. (Irr) 

D. Stressing knowledge as the only means to success. (OS) 
Dexter' s approach to the schools is essentially upon an 

emotional level. In attributing a person's inferiority complex to the 
school his implication is that the basic strength of contempt for 
stupidity is in its reflection upon self-esteem. This item tests the 
examinee's "ability to understand nonliteral statements (exaggeration)" 
(Bloom: p. 20T). The classification is 2.10. 

In foil 3A (3D-,) the phrase "in extracurricular activities" 
would have to be replaced by the phrase "in academic pursuits to the 
exclusion of success in self -corrective activities," to be correct. 
This foil contains a Substitution (Sub). 

Competition between students of unequal ability can lead to the 
continuance of contempt for stupidity provided that the purpose of the 
competition is to make the loss able appear stupid. It is the 



objective and not the fact that is critical. The fact itself ie 

irrelevant; therefore , this foil is an Irrelevancy (irr) . 

In foil 3D (3Do) the critical aspect of this statement is "to 

the exclusion of success in self-corrective activities." Hence this 

foil is an Oversimplification (OS) . Notice that foils 3A (3D-^) and 

3D (3^>p) are both related to a "correct" answer which is not given in 

this item. It would seem perfectly legitimate when more than one right 

answer is possible for a particular item to use alternative right 

answers for the generation of foils. 

k. The author, in charging that "society teaches contempt for 
stupidity and a fear of being regarded as stupid" by means 
of the school, is assuming that; (Bloom's Category 4.20) 

A. The school should not be an enforcing arm for the 
customs of society. (OG) 

*B. The school is a. more powerful socializing force than 
the home. 

C. The home is a more powerful socializing force than the 
school. (lav) 

D. The school is an enforcing arm of the customs of 
society. (irr) 

This item asks the examinees to "recognize a hidden assumption" 
(Bloom: p. 206) which makes this an analysis (4.20) item. 

If the school ie at fault it must have more influence on the 
child than the home has on the child. Foil kC ('kDp) mus ^ ^ e an 
Inversion (inv) by virtue of being opposite to the correct answer. 

Foils 4A (h-D^) and kH (4D^) are related since both contain the 
same irrelevant premise. However, kA (-+D-^) also contains the additional 
unwarranted value judgment "should not be." By virtue of the rule which 
excludes category repetition, it becomes reasonable, at least superfi- 
cially, to classify hi) (4D, ) as an Irrelevancy (irr) and kA (kD ) as 



an Oveigeneraiization (OG). In the case of however, thin 

Overgeneralizati on is an unreasonable extension of a statement which, in 
itself, is incorrect, suggesting that second thoughts might have led to 
a more reasonable classif ication of this foil, as the results of 
subsequent analysis showed. 

Second Reading Selection 

Source: Marris, Peter; The Experience of Higher Education , 
London, Routledge Kegan Paul, 1964, p 0 175 • 

In this sense, it does not matter what subject a student 
studies, since each is leading towards a generalized intellec- 
tual awareness. But the starting point is still important since 
a student has the greatest incentive to understand whatever 
relates most immediately to his interests. Nor are the concepts 
derived from any one field of study equally relevant to any 
others: the ramification of insights remains biased by its 
roots. The intellectual content has to both guide and be guided 
by the purposes for which a student seeks understanding. 
Otherwise it is meaningless. 

If, then, higher education aims to teach students how to 
abstract, from a particular context, principles by which they 
can organise the perception of their universe of thought, it 
requires that these students have a use for such free-ranging 
understanding. When they enter higher education, their aims 
are confused, and they may not see, or wish to see, the value 
of a generalized intellectual skill. Their approach to learning 
has been conditioned by extraneous motives : they worked to win 
approval or avoid blame, to pass an examination, as much as or 
more than for the sake of understanding. They are not used to 
asking themselves what they want to understand, or why, but 
derive enough interest to master the skills required of them 
from a desire to satisfy the authority who sets the task. So, 
I think, the function of higher education is as much to develop 
the autonomy of their desire to understand, as to satisfy it. 

5. The author suggests that a generalized intellectual 

awareness can be achieved by: (Bloom's Category 3.00) 

A, Focusing on progressively more difficult topics in a 
subject. (IA) 

B. Teaching the students how to generalize from specific 
con tent. (CM) 
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C. Presenting highly abstract material which is extensive 
in scope. (OS) 

*D. Presenting any subject matter in any predetermined 
sequence . 

This item was treated as an application (3.00) item because it 
requires the examinee to make "use of abstractions in particular and 
concrete situations." (Bloom: p. 205). For this reason this item was 
classified as application (3.00). The right answer is also, in fact, 
an Oversimplification (OS) because Marris says "the starting point is 
important" (see p. '47 ; )« However, the use of this phrase in 5D would 
have produced a "Clang association" which Thorndike and Ha gen (l96l) 
point out should not be used. (See p. 28), Alternative 5B, is, 
nonetheless, the most nearly correct of the four alternatives. 

The first foil (5A or 5D n ) assumes that 1.) some specific 
subjects are needed for the development of generalised intellectual 
awareness and that 2) instruction must, of necessity, begin with the 
"easier" topics first. Both of these assumptions are explicitly stated 
as invalid in the selection. This foil was classified as an Invalid 
Assumption (IA). 

The relationship between generalized intellectual awareness and 
inductive reasoning as suggested in $B (5D„) is a very common over- 
simplification. With another oversimplification foil in the same item 
the classification of this foil is a Common Misconception (CM) would 
seem to be quite reasonable. 

Similarly, the identification of generalized intellectual 
awareness with the transferability of content is an oversimplification 
of the topic of Marris' (196^) discourse. Therefore, foil 50 (5D~) is 
classified as an Ov e r s imp 1 i f i ca t i on ( OS ) . 



6. The purpose of developing a generalised intellectual 
awareness is to: (Bloom's Category 2,30) 

*A. Promote thinking ability which is not contextually 
hound . 

B. Enable an individual to master any subject area. (OG) 

C. Stimulate thinking ability within the individual's 
chosen field „ (Sub) 

D. G-ive the individual an ever-widening view of his 
world. (irr) 

In order to answer this question, the examinee is expected to 
make an "extension of trends or tendencies beyond the given data." 
(Bloom: p. 205). On this basi.s this item was classified as 
Comprehension (2.30). 

In foil 6B (oDj ) the mastery of "any subject area" is far too 
broad a statement for the purpose of Marris 1 (196-)) discussion. Hence 
thi s foil is an 0"e rgeneraiizati.cn. (OG) 

In the case of 6c (^Dg) phrase "within the individual's 

chosen field" is substituted for the phrase in the correct manner 
"which is not contextually bound" making this a Substitution (Sub) 
foil. 

For foil 6l)-(6D^) the absence of context in the right answer 
renders the We 1 i an s hauung (World view) aspect of this foil irrelevant, 
hence 6D (6D^) was classified as an Irrelevancy (irr). This foil could 
also be a Word-Word Link (WW) because of similarities in phrasiology 
between "ever widening view" and "free ranging understanding" (see p. 
152). 

7. Of the following, the best example of generalized 
intellectual skill is: (Bloom's Category 3-00) 

A. Thinking within the confines of particular subject 
areas. (Sub) 
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B. 



Generalizing from the concrete to the abstract. (WW) 



*C. 



The widely applicable technique of logic. 



D. 



Applying abstract principle to new 



situations . 



(OS) 



The phrase "best example" in the stem led to the classification 
of this item as an Application (3.00) item. 

There are strong similarities between this item and the previous 
two. For instance, the "particular subject areas" phrase is similar to 
the "individual's chosen field," in item 6 so that foil 7A (7D-,) should 
also be classified as a Substitution (Sub). With respect to ?B (?Dp) 
the use of induction is similar to 5B (5D„)o In this case, however, the 
Word-Word Link between "generalized" in the stem and "generalizing" in 
the foil is somewhat stronger because of the context than in item 5« 
Kence ?B (7D ) was classified as a Word-Word Link.. (WW) 

Also, the confusion in equating transfer of training with 
generalized intellectual awareness found in 5C (51)^) reappears in 7D 
(7Do) which makes it reasonable to regard this foil as an Over- 
simplification (OS) as well. 

Third Reading Selection 



Aggression is a second behavior system that begins its growth 
during the first five years. Traditionally a response was 
labeled aggressive if the goal of the behavior was assumed to 
be psychological or physical injury to a person or person 
surrogate „ We have adhered to this definition. As with depend- 
ency the display of aggressive acts is a regular concomitant 
of development. The slapping or pushing of an age mate, the 
destruction of a sibling's new fort, ana the stinging verbal 
attack are regularly observed in the behavior of many children. 



Source : 



Kagan J. and Moss, E, A.; Birth to M aturity , 
N.Y., Wiley, 1962, p. 85. 



Aggression, like dependency, is subject to socialization 
pressures, for the child does not have complete license to 
unleash his anger when he chooses. In addition, as with 



dependency, the occurrence of overt aggression is a function of 
"both the threshold for motive arousal and the intensity of 
anxiety associated with direct expression of this behavior. 

In contrast to dependency, however, the potential for 
conflict over aggression is greater for females than for males. 
The pattern of social rewards and traditional sex-role standards 
act in concert to discourage the direct expression of aggression 
in girls and women. It might be anticipated, therefore, that 
aspects of aggression would be more stable for males uhan for 
females. This is precisely what occurred, for overt aggression 
to mother and frequent tantrums during childhood predicted adult 
aggressivity for men but not for women, 

8. If the school were to encourage tolerance for honest 
mistakes, we would expect aggression to: 
(Bloom's Category '+.10) 

A. Diminish somewhat „ (IA) 

" X \B. Take different forms. 

C. Disappear completely. (OG) 

D . Re ma i n un change d . ( I n v ) 

Although this item makes a reference to the first reading 
selection (Stupidity), the question can be answered within its own 
context. For this reason this item was not classified as synthesis but 
as analysis (k.10). The logic of this item revolves around the assump- 
tion of the author that aggression is an innate characteristic of human 
beings which can be modified but not diminished. Thus 8A (8D,) can only 
be true if this assumption is violated. Foil 8A (8D ) would seem tc be 
an Invalid Assumption (IA) foil in the sense that the examinee must 
make an invalid assumption to select this answer as "correct." Foil 8C 
(8D,, ) strongly overstates the same error as found in 8A (8D^). For the 
same reason as kA (kl)^) this foil was classified as Overgeneralization 
(OG). 

Changes in the psychological climate will lead to changes in the 
modes of expression of aggression which makes foil 8D (8D 0 ) an Invez-sion 
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(Inv). 

9o The basic position of the author ir; writing about aggression 
is that it: (Bloom's Category J*. 20) 

*A. Is inevitable but can be direct through socialization. 

B. Can be eliminated through the process of socialization. 
(Inv) 

C. Will result in internal conflict independent of the 
environment o (Sub) 

Bo Is crippling to the individual by wasting considerable 
energy o (CM) 

This item requires the recognition of the assumption of the 
authors which was indicated in relation to item 8. For this reason the 
item was classified as analysis (4.20). 

Foil 9B (9lO is an Inversion (inv) for the same reason as 
8D (8D 3 ). 

Aggression, as an innate behavior system, will produce internal 
conflict only if its modes of expression are frustrated. Hence, 
internal conflict is "dependent upen environmental conditions" rather 
than being "independent of the environment „ " Since there is already an 
Inversion (inv) foil in this item, another category had to be found for 
this foilo Comparing the two statements "dependent upon..," and 
"independent of.*." the latter phrase can be treated as a replacement 
for the former. Therefore, this foil 9C or 9D-,) w a s classified as a 
Substitution (Sub), 

Aggression can be harmful, but as one basis for intrinsic- 
motivation it can be constructive as well» Foil 9B (9D^) by treating 
aggression as being exclusively harmful oversimplifies the situation 0 
This oversimplification is so commonly held that this foil was 
classified as a Common Misconception (CM). 



3.0, With which of the following statements concerning 

aggression would the author be most likely to agree? 
(Bloom's Category 6.10) 

Ao Aggression is like dependency in that it is harmful to 
personality development „ (CM) 

Be Aggression generally interferes with the attainment of 
educational goals. (inv) 

*C„ Aggression is potentially useful for educational 
purpose „ 

D. Aggression is considered to be a response to threats 

to a person or person surrogate. (WW) 

This item asks the examinees to evaluate the statements made in 
the alternatives against the information given by the authors about the 
topic, therefore this item was classified as Evaluation (6,10). 

Foil 10A (101) ) was classified as a Common Misconception (CM) 
on the same basis as foil 9D (93~) . 

As pointed out with reference to item 9? aggression can be one 
of the bases for intrinsic motivation. Hence the possibility that 
aggression may be potentially useful for educational purposes may be 
inferred from the selection „ In this case the opposite statement as 
in 10B (103 ) must be an Inversion (inv). 

Foil 10D (10D ) is best classified as a Word-Word Link (WW) on 

the basis of the phrase "person or person surrogate." This foil is 

wrong because it contains the phrase "a response to" which is 

extraneous to the definition of aggression,, (see: p. 51 ) 

11. Overt aggression would likely be decreased by: 
(Bloom's Category 3=00) 

A. Blocking of many modes of aggression. (inv) 

B. Lessening the threat of punishment. ("OS) 
*C. Increasing the threshold of motive arousal. 
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D. Motivating people to rise above their peers. (RT) 

This item was classified as an application (3=00) item because 
it asks for a practical method of behavior change. 

Blocking of modes of aggression (11A or llll) would be expected 
to intensify responses in the remaining available directions, hence 
this would not necessarily decrease overt aggress ion o This foil was 
classified, therefore, as an Inversion (inv). 

If the threat of punishment is lessened, overt aggression may 
or may not temporarily increas e . depending upon the amount of frustra- 
tion which has previously developed and the way in which the threat is 
lessened. If the release leads to an increased frustration the increase 
in overt aggression would continue. On the other hand, if lessening 
threat also lessened frustration and provided for alternative modes of 
expression, overt aggression could decrease. Foil 'J. IB (llD ) must be 
considered an Oversimplification (OS) in this context. 

Rising above one's peers can involve the use of aggression as 

intrinsic motivation but it Cciri 8,1 30 be accomplished by the use of 

overt aggression or the threat of its use(i 0 e o intimidation). The term 

"motivating" in this sense refers to "behavior modification" rather 

than motivation in the sense used by psychologists <, Foil 11D (lllL) 

was classified as a Redefinition of Terms (RT). 

12 o Aggressive behavior in female children is: 
(Bloom's Category 2.30) 

*A. More likely to produce guilt feelings than in males. 

B. A less likely occurrence than in males of the same 
age. (CM) 

C. More unpredictable and is expressed differently than 
in males of the same age group, (OG) 



D. Less differentiated in expression than in males of the 
same age , ( Inv ) 

Once again the examinee is expected to go beyond the given 

information in order to recognize the role of guilt in child rearing 

practices. For this reason this item was classified as comprehension 

(2„30) e 

Since aggression in males and females tends to take different 
forms because of sex differences in child rearing practices, guilt is 
more likely with females. Both sexes show aggression. The difference 
in mode of expression leads to the common misconception that girls are 
less aggressive than males, hence the classification of 12B (12D^) as 
a Common Misconception (CM). 

The first two words in 12C (12D 0 ) make this statement false. 
The lack of predictability is from childhood to adulthood and not 
across peer groups. This foil is therefore classified as an 
Overgeneralization ( OG) . 

Foil 12D (12D^) is exactly opposite to the true state of 

affairs making this foil an Inversion (inv). 

13. If we assume that the school increased its use of contempt 
for stupidity as a motivating device, we would expect that 
(Bloom's Category 5«30~) 

A. Parental pressure would intervene to prevent the 
school from making this change „ (IA) 

*B. Overt aggressive behavior would increase and 
autonomous thinking would decrease. 

C. Both academic success and generalized intellectual 
awareness would increase. (CM) 

Do The level of student motivation would decrease rather 
than increase. (RT) 

This item is classified as a Synthesis item (5. 30) because it 
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involves mere than one reading' selection in order to achieve the answer. 

Foil 13A (13D^) involves the assumption that parents would 
oppose this move. However, contempt for stupidity could not be used at 
present if it did not have at least support "by implication from parents 
at the present time. The major supporters of the school system, the 
middle class, want to keep the "riff-raff" out of the professions as 
unwanted competition for the aspirations they have for their own 
children. Contempt for stupidity as Dexter (196^-) implies is an 
extremely effective method of destroying the academically unfit. It is 
unlikely that powerful parents would oppose this move. This foil was, 
therefore, classified as an Invalid Assumption (IA). 

Contempt for Stupidity has the effect of maintaining the 
dichotomy between the academically successful and the others. Increas- 
ing this pressure would sharpen the dichotomy and would not necessarily 
increase the academic success of the survivors and would most certainly 
not increase the academic success of those who did not survive. On the 
other hand, the overt effect of this increase would be to produce an 
apparent increase in academic standards which would be expected to lead 
to the common misconception that making school achievement more diffi- 
cult improves the quality of schooling. For these reasons foil 13C 
(13D2) wa s classified as a Common Misconception (CM). 

Foil 13D (l3lO redefines motivation in the narrow sense of 
positive intrinsic motivation (i.e. interest). The use of contempt for 
stupidity is, in fact, increasing the level of extrinsic aversive 
motivation. This foil was classified therefore , as a Redefinition of 
Terms (RT) foil. 
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Iko Which of the following best describes the probable 

relationship between contempt for stupidity and generalized 
intellectual awareness? (Bloom's Category 5° 30) 

Ao Changes in either will have no effect on the other. 
(Trr) 

*Bo As one increases the other will decrease. 

C Either will increase with an increase in the other. 
( Sub ) 

D. Contempt for stupidity should be reduced and awareness 
should be increased. (OG) 

This item, once again, involves two selections (Stupidity and 
Awareness) o For this reason it was classified as a synthesis (5.30) 
item. A person who is motivated by contempt for stupidity (his own and 
others') would be expected to be constantly en guard e against making 
mistakes. Such an orienta.tion toward his own behavior would tend to 
make him intellectually cautious and hence less inclined to the 
expansive thinking needed to develop a generalized intellectual 
awareness. These two variables would be most likely to be inversely 
related thus explaining the correct answer. 

Treating these two variables as unrelated as in Foil 14A (lA-D^ ) 
is contradicted by the information in these two passages. This 
statement could be true in another (independent) context, hence the 
Irrelevancy (irr) classification of this foils 

One part of the statement in foil lA-C (l^f-D^) is correct, the 
other incorrect, hence this foil was classified as a Substitution (Sub) 
foil. 

Foil lA-D (l^D^) contains an unwarranted value judgment when only 
the relationship and not its psychological importance is asked for. 
This foil was classified as an Overgeneralizat ion (OG). 
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The classification of the foils in this item is difficult 
because three of the foils are based to a large degree upon logical 
relations rather than errors in logic. Trie three possible relationships 
direct (l4B) inverse (l4B) and unrelated ].4A) form the basis for three 
of the four foils. It might have been more reasonable to have 
classified IkA {ikV^) and 1^+C (l4D ? ) as Other (0) than to attempt to 
establish classifications on the basis of the rather tenuous arguments 
given here . 



Fourth Reading Selection 

Source: Dinkmeyer, D. Co; Cii i 1 d Development , Englewood Cliffs, 
N. J., Prentice-Hall, 1965, p. 59. 

The social studies committees were working on their reports. 
Boris was chairman of the southern states committee which 
included Jack, Susan, and Bill. There seemed to be confusion in 
this group so I decided to investigate. "Jack won't co- 
operate, 1 ' complained Susan. "What do you want him to do?" I 
asked. Jack was frowning. "They say I have to study economic- 
conditions in the states, and I am interested in state 
capitals," said Jack. "Bid you volunteer to take economic 
conditions?" I asked . "There wasn't a chance to volunteer . 
V/e were just told her plans," answered Jack. "Is anyone 
investigating the state capitals?" I asked. The children 
indicated this job had not been assigned. "In that case, does 
the group mind having Jack study the capitals?" No one seemed 
to care. "What about the rest of you--are you all satisfied 
with your jobs?" They were. Jack went to the reference shelf 
and started to read. „ 

-D • ri • 

15. From this report we may infer that the: 
(Bloom's Category 2.20) 

A. Classroom is very well equipped with instructional 
materials. (OG) 

*B„ Classroom probably has moveable seats. 

C. Class is studying the Southern United States. (WW) 

B. Teacher favours voluntary participation. (P-T) 

This item involves a, "reordering, rearrangement, or new view of 
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the material (Bloom: p. 205) which explains its classification (2.20) 
level . 

Foil 1'jh (15D ) overstates the situation in the phrase "very 
well equipped" proposing an inference which goes far beyond the 
information in the passage than is reasonable. This foil was 
classified, therefore, as an Cvergeneralizati on (OO). 

In foil 15C (15D ) only one committee and not the whole class 
seems to have been studying the southern United States. Since this 
item already has an OG foil, the next most reasonable is a Word-Word 
Link (WW) on the grounds that careless reading might lead to this word 
association,, 

In foil 15D (l5lO the term voluntary is used in more than one 

sense. Actually, the teacher has substituted her own arbitrary 

decision for Doris ' decision. This foil could also have been a WW foil 

except that l^C (l.5D,j) fits this category better. For these reasons 

foil 15D (l5D«) was classified as Redefinition of Terms (Ri)o 

16. If the teacher had written "Doesn't work well with others," 
as an anecdotal record for the above incident, this would 
have been: (Bloom's Category 6. 10) 

A. Better; it says the same thing with less words, (irr) 

*B. Worse; it fails to indicate the circumstances of the 
incident „ 

Co Better; the details of the event are unnecessary when 
judging Jack's behavior. (OG) 

J). Worse; teachers are failing in their obligations in 
not supplying complete information. (CM) 

This item clearly asks for a value judgment based upon the 

evidence in this reading selection which makes it an Evaluation (6.10) 

item. 



Since it should 'be evident by comparison of the two alternative 
descriptions given for this same incident that the function of an 
anecdotal record is to {jive a clear picture of an event for future 
reference, a goal of "saying the same things in less words" is 
irrelevant to the task at hand. For this reason foil 16A (l6n. ) was 
classified as an Irrelevancy (lrr)„ 

Adding unwarranted statements concerning judgment of behavior 
makes this foil, 16c (l6~D ) , an Overgeneralization (OG). 

Once again, inappropriate value judgments are involved in I6l> 

(16D ), this t irne directed at the teacher rather than the student. 

This attitude is so common that this foil was treated as a Common 

Misconception (CM)„ 

17. From the above passage we can infer that Doris' leadership 
of the group was: (Bloom's Category if. 20) 

*Ao Coercive o 

Bo Autocratic o (OG) 

C. Destructive. (inv) 

D. Laissez-faire. (Sub) 

In item 17 the examiner is expecting the examinee to "comprehend 
the interrelationships among ideas in a passage (Bloom: p. 206)." 
Hence the analysis (if. 20) classification. 

On the Basis of the argument that the successful autocrat would 
not tolerate contradiction and therefore have no overt objection to his 
or her decisions, Doris' leadership was considered as "coercive" rather 
than "autocratic" for the Best answer. Notice, By the way, that the 
teacher is a successful autocrat in this passage. For these reasons 
foil 17B (17D ) was regarded as an Overgeneralization (OG). 



It is not evident from the passage that Doris' leadership was 
destructive. In fact, she apparently had the support of the two members 
of the committee one of whom reported the problem to the teacher. Being 
opposite to a possible best answer, foil 1?C (17IL) was classified as an 
Inversi on ( Inv ) . 

Since Doris' attempts to coerce Jack were ineffective, she 
permitted the teacher to intervene. As a result, her later leadership 
was laissez-faire, but only under the arbitrary intervention of the 
teacher. The replacement of her later performance for her former 
performance led to the classification of this foil (17D or 1?D^) as a 
Sub s t i tut i on ( Sub ) . 

Once again, the classification of these foils is tenuous and 
open to disagreement. The format of these foils also deviates from the 
usual format of foils in this test, as in the case of item 13 and 
items 19 to 2k inclusive. It is possible that an "Other" (0) classifi- 
cation of these foils would have been more reasonable. 

18o Prom the description of the incident, we can conclude that 
the teacher's handling of the incident was: 
(Bloom's Category 6.10 ) 

A. Good; she intervened to prevent a serious conflict from 
continuing. (Sub) 

*B Poor; she allowed Jack to use her authority as a lever 
to get his own way. 

Co Good; she resolved the problem to the mutual 
satisfaction of the group. (irr) 

D. Poor; she failed to collect sufficient information 
before enforcing a decision. (OG) 

This item created a good deal of consternation upon its first 

administratioiic At issue seemed to be philosophical differences between 

the examiner and the examinees. It is probable that this problem maybe 
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a complication which is possibly inherent in any multiple choice 
Evaluation (6d0) item where the evaluative criteria is not supplied. 
The problem arose essentially because many examinees insisted that the 
function of the teacher was to prevent or to eliminate conflict „ In 
this case either 18A or 18C would be correct depending upon the 
interpretation given to the phrase in the reading selection "no one 
seemed to care," 

The examiner, on the other hand, took the stand that the 
function of the teacher is to educate. If conflict arises the conflict 
should be used in an educational manner. In this case, the teacher 
should have found out why the topic of economics was important enough 
to the group to have engendered the conflict. Once Jack understands its 
importance he may agree to do it. That is, the teacher helps to improve 
communication. If, on the other hand, Jack remains adamant, forcing him 
to do something disagreeable to him may not help. In this case, the 
reorganization of committees more nearly upon sociometric lines might 
improve the situation. There may be seme personality reasons for Jack's 
behavior. In this case, the teacher's long-tern role is to help Jack 
cope with his own and others' personalities. Letting Jack have his own 
way does not help meet this latter goal., Hence the keyed answer. 

Some of the examinees argued that they had insufficient informa- 
tion to answer this question. This argument was discounted because the 
differences still seemed to be philosophical. In any case, the few 
people who chose the keyed answer had the highest average total-correct 
score which meant the retention of this item. 

The prevention of conflict rather than the educational use of 
conflict led foil 18A (18D ) to be classified as a Substitution (Sub). 
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V/ith education as a goal, the mutual satisfaction of the group (as in 
18C or 18Dp) is irrelevant, hence the foil was considered an Irrelevance 



(irr) 



The classification of 18P (18D^) as an Gvergeneralization (OG) 



is somewhat arbitrary. The teacher could have sought more information, 
hut the essential problem is her use of the information which she 
obtained. Since 18C (18D_) was already classified as an Irrelevancy 
(irr) some other classification is needed. 

Fifth Reading Selection 

Source: preseott, D. A. ; The Child in the Educative Proces s , 
N. Y. , McGraw-Hill, 195?, pp. 125-126. 

Progress itepor t 

X Attendance Area Y County Schools 

Name: Chester M Teacher; Miss C. Grade; 6 

pays Absent: 0 pays Tardy: 0 

Reading: Is reading independently on the third-grade level and 
instructionally on the fourth-grade level. . Poes not enjoy 
reading. Finds many excuses to leave reading to do something 
else. Has trouble understanding what he reads. Is better able 
to find, facts than to interpret facts. Has trouble finding 
words in context when meaning is given. 

English Language: Has a wide speaking vocabulary . Uses correct 
English. Poes not enjoy story writing. Understands sentence 
construction . 

Spelling: Learns words in spelling lessons and uses them in 
written work. Enjoys spelling. 

Writing; Spaces words well. Is practicing again on the 
formation of letters. Is not neat in written work. Erases 
often. 



Arithmetic: Has worked again this year on addition, subtraction 
and multiplication. Had some trouble with subtraction. Is not 
ready for division. Has had experience with problem solving. 
En j oys ari thine tic. 

Social Studies: (History, geography, and civics). Has worked 
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with maps. Takes part in discussion. Showed interest in a 
study of his community. Shared materials. Is trying for a 
better relationship with classmates. 

Science: Experimented with the force of air. Has become 
interested in cloud formation. Likes dogs. 

Music and Art: Listens to music. Takes part in singing and 
rhythms. Enjoys all phases of music. Works with clay, wood, 
paints, and f ingerpaints . Enjoys all media of art expression, 

Instruction for Questions 19 to 2k 



Las 


sed en tl 


:c above pr<: 


:grcrs report answer the next s 


by 


marking; 






A. 


If the 


hypothesis 


is supported by the facts. 


B. 


If the 


hypothesis 


is implied by the facts. 


C. 


If the 


hypothesis 


is refuted by the facts. 


D. 


If the 


hypothesis 


cannot be tested by the facts. 






(Bloom's Category 4.20) 



Hypotheses to test : 

19c Chester is not liked by the other children; he avoids 
trying to read because he doesn't want them to see him 
fail . 

20. Chester lacks character. He does all sorts of bad things 
and will not discipline himself to learn to read because he 
has not been punished enough. 

21. Chester is growing very slowly and really is quite immature 
for his grade. Everyone expects too much of him. 

22. Chester has no real reason to "want to rea.d, since no one 
ever reads at home. 

23. Chester's reading deficiency has not yet begun to affect 
seriously his performance in other areas „ 

24. Chester's mother has kept after him about reading until he 
hates it. 

All six of these items were classified as Analysis items (4.20) 
because of the hypothesis testing characteristics of their formate 
This format was used as a marker for analysis subtests. However, 
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because of the format, the classification of the foils became 
problematic. The simple expedient was used of classifying all the foils 
for these items as Other (0). 

25. The most useful suggestion to help Chester is: 
(Bloom's Category 4,20) 

A. To give Chester personal warmth, acceptance and support 
wherever it is appropriate. (Sub) 

*B. To give Chester concrete help in getting started on 
specific tasks, especially in reading. 

C. To give Chester responsibilities and roles of 
acknowledged importance in the daily life of the 
classroom. (irr) 

D. To try to get Chester's mother to take the pressure 
off him and offer him more opportunities for self- 
direction, (ia) 

The examinee is expected to comprehend interrelationships in the 
answering of this item, hence its analysis (4.20) classification. Foil 
25A (251)-^) substitutes emotional support for corrective instruction, 
hence the Substitution (Sub) classification of this foil. 

The treatment suggested in 25C (25D^) has no bearing on his 
academic needs; it was therefore classified as an Irrelevancy (irr). 

There is no evidence in this selection that there is an 
unreasonable pressure on Chester by his mother; hence this foil 25D 
(25D^) involves an Invalid Assumption (IA). 

26. If additional information on Chester is desired, and none 
of the following had been attempted, which one would 
provide the greatest amount of immediately useful 
information? (Bloom's Category 5° 20) 

Ao An interview with Chester's previous teacher. (0) 
*B. An interview with the parents. 
C. A diagnostic test in reading skills,, (OS) 



Do A request for the assistance of a guidance counselor. 
(Irr) 

This item involves the examinee generating a structure to 
represent Chester's entire situation by inductive reasoning prior to 
answering the questions For this reason, this item was regarded as a 
synthetic (5.20) item. 

The best first hand source of information aboiit Chester is his 
parents. The next best .Is his previous teacher. Since the Guidelines 
do not make any provision for this kind of relationship, foil 2oA (26D^) 
is best classified as "Other" (0). 

A diag.iost.ic test in reading skills is only useful to a teacher 
who know enough about these tests and the reading problems they 
diagnose to be able to use them effectively , Also, administering and 
interpreting such tests can be time consuming,, It is an Oversimplifica- 
tion (OS) for foil 26C (26j)^) to suggest this course of action to be 
superior to any other. 

Most of the examinees were experienced teachers, hence it was 

reasonable to assume that they would know from experience that guidance 

personnel rarely can give a teacher information they do not already 

knoWo This effect occurs because test batteries are rarely more 

reliable than a month or two of sensitive observation by a teacher. 

Therefore, the course of action suggested in foil 26D (26d^) is 

classified as Irrelevancy (irr). 

27. A reasonable conclusion which can be drawn from this 
report is that: (Bloom's Category ^-.10) 

A. Chester's problem stems essentially from his poor 
relationship with his mother. (lA) 

B. Chester's problem stems essentially from his poor 
relationships with his peers. (irr) 



*C. Chester's problem has no single cause and no simple 
solution. 

D. Chester's problem stems from such a wide range of 

sources that a classroom solution is impossible . (CM) 

This item is somewhat difficult to classify because the "drawing 
conclusions" is not part of Bloom's Taxonomy . However, this item also 
involves the examinee's "skill in distinguishing facts from hypotheses" 
(Bloom; p. 205), hence the analysis (4.10) classification of this item. 

Once again, the mythical poor relationship with his mother is 
introduced in foil 27 A (27D ). In this context the most reasonable 
classification of this foil would be an Invalid Assumption. (IA). 

Chester's problem seems to be centered upon his reading 
difficulty. His relationship with his peers may influence his motiva- 
tion to attempt improvement, but is irrelevant to his problem. There- 
fore, foil 12B (l2D 0 ) was classified as an Irrelevance (lrr) c 

Foil 27D (27D 0 ) overgenerali zes to an extreme level which made 

the most reasonable classification for this foal to be Common 

Mi s c on c e p t i on ( CM ) . 

28. In Chester's progress report, which one of the following is 
the most important factor contributing to his difficulty 
with school achievement? (Bloom's Category 5-30) 

*A, Aggression which is building up due to frustration 
over his reading development. 

B. His inability to develop a generalized intellectual 
awareness. (irr) 

C. His weakness in reading which is affecting all areas 
of learning. (OG) 

D. The teacher has been using "contempt for stupidity" as 
a motivating device. (IA) 

This item, once again, involves more than one reading selection 

and was therefore treated as a synthesis (5.30 ) item. 



180 

Foil 28B turned out to bs somewhat unreasonable because the 
author assumed that the examinee would be able to identify the fact that 
generalized intellectual awareness is an adult phenomena. The evidence 
is tenuous since the Awareness selection from its title is related to 
university education, and Chester in this (Progress) selection is in 
Grade six. This foil would be an Irrelevancy (Trr) but it may have been 
unreasonable to expect so tenuous a connection to be made if it could 
not be assumed, that the examinees would know this fact. 

The progress report indicates that there are some areas of 
Chester's development which are not out of step which makes foil 28C 
(2Bl) n ) an Overgeneralization (OG). 

Foil 28D (28D^) suggests that this teacher is using contempt for 

stupidity for motivation, which may be an Invalid Assumption (lA) since 

the tone of the report is supportive rather than condemnatory. 

29. On the basis of the foregoing which of the following seems 
. to be the most important consideration when preparing 
anecdotal records or progress reports? 
(Bloom's Category 5-30) 

A. Make no attempts at interpretation since your judgments 
are probably biased. (CM) 

*B. Present as much informat ion 3,3 poss ible about all the 
salient aspects of the situation. 

C. Be as brief as possible, giving no information which 
may cloud the central problem. (OS) 

D. Do not put anything into these reports which might 
antagonize the child's parents. (RTJ 

This item also involves more than one reading selection and 
therefore it was classified as a synthesis (5-30) item. 

In the case of foil 29 A (29D, )j it is impossible to avoid 
judgments in any reporting, hence this advice is a Common Misconception 



181 

(cm). Because of the problem of observer bias, as much pertinent 

information as is possible should be supplied so that alternative 

interpretations can be considered by others «, This latter statement 

makes foil 29C (2?D ) an Oversimplification (OS). 

In foil 29D (291^) the term report referring to "progress 

report" is redefined by the suggestion that such a report may become 

public property, i.e. it will be part of the "report card" to the 

parents. This approach involves a Redef inition of Terms (RT) » 

30. The most important principle illustrated in this set of 
questions is that the teacher should: 
(Bloom 1 s Category 5 • 30 ) 

A. Promote generalized intellectual awareness in 

aggressive children by using contempt for stupidity 
as a motivating force. (WW) 

*B. Recognize that developmental deficiencies arise from 
complex circumstances, requiring multiple-strategy 
o oint ions . 

C. Seek professional assistance from the school counselor 
in the identification of developmental problems. (Tr) 

Bo Recognize that "contempt for stupidity" is not 

necessarily an effective way of generating motivation 
in pupils. (RT) 

This item synthesizes the previous twenty-nine which led to its 
synthesis ( 5 • 30 ) classification. 

Foil 30A (30B, ) is a good example of a contrary-to-fact 
statement developed by the glib use of the repetion of phrases from the 
reading selections. It illustrates very well the way in which Word Word 
Link (WW) foils might be generated. 

The discussion concerning the role of the counselor which 
occurred on page 66 suggests that it would be more reasonable to get the 
teacher to identify the problem and then get the professional's help in 
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This success ratio of one out of throe is equivalent to that of the 

experimenter (see: p„ 75)" 

It should also he noted that the nature of the examination was, 

at least in part, predetermined by the examiner's philosophy of 

education. This characteristic of the examination is evident in the 

first place in the nature of the reading selections upon which the 

examination is based. Second, it is evident in the nature of the 

questions asked concerning these reading selections. Third, it is 

evident in the reasons given for the classification of items and foils. 

In general, it is hoped thai; the major portion of this bias is confined 

to the nature of the reading selections used and that once these are 

given the astute reader should be able to infer the bias, and answer 

accordingly. The possible exceptions which are clearly evident are 

item 18 and item 28. particularly with respect to foil 2 SB (28D^). 

It is being argued that bias is unavoidable and, hopefully, can 

only be minimized in its adverse effects upon student performance. The 

clear thinking student should be able to recognize and adopt a number 

of points of view concerning any particular subject matter and apply 

logic, once the point of view is assumed, in order to arrive at 

reasonable conclusions. So long as the logic which follows from the 

assumptions cannot be faulted the system itself can remain intact. 

The purpose of presenting the development of the experimental test in 

such detail was to expose the logic: of the test including the reasons 

why the foils are considered to be wrong (i.e. its construct validity) 

to such reasonable attacks as may be made. If the construct validity 

of the test is supported, on both logical and evidential grounds, then 

the experimental test may be regarded as an effective measuring 
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APPENDIX C 

LOGICO- SEMANTIC ANALYSIS OF RIGHT 
AND WRONG ANSWER CLUSTERS 

This appendix presents a detailed discussion of the items and 
foils which differed in their advance classification from the classifica- 
tion of the clusters in which they occurred. A cluster was classified by 
the most frequently recurring advance classification in the cluster. 

The findings for this part of the study were summarized on pages 
70 to 71 for the right answer clusters and on pages 71 to 79 for the 
wrong answer clusters. This detailed .analysis is given here for two 
reasons , First, it was felt that the effective reclassification of 
alternatives represented evidence in support for the multiple interpreta- 
tion hypothesis. Second, it was felt that subsequent researchers might 
find value in an independent evaluation of the logic which led to the 
conclusions this study has presented* 

The Mean ingful In t e rp re t a t i on of Ite m Clusters 

In an exploratory study into a new area of research, the 
relative relevance of characteristics can be expected, in general, to be 
unknown „ For this reason, the failure of the advance classification of 
items to provide much assistance towards a meaningful interpretation of 
the data was disappointing, but not surprising. 

On the other hand, the cluster solution used, replicated the 
advance classification by Bloom's Taxonomy to the extent that per 
cent of the items which appeared in a single cluster also held a common 
advance classification. Table II (see: p. 73) gives the clusters from 
Group A and indicates which items were in a common classification, the 
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classification of these items, and the final interpretation given to 
each cluster. 

Where the interpretation remained ambiguous in Table 11 
(see: p. 73) the uncertainty is indicated. It would be a fairly simple 
matter with clusters like C r and C , for instance, to assume that the 

o 1J 

advance classification of these items adequately interprets the cluster. 

In other cases such as C, , C. or 0 , the majority of the items 

1 5 i 

were in a common class. If the class of the majority is used as an 
interpretation, the members which did not share this common 
classification must be explained 0 

Finally, there were several clusters, C„, C^, C^, Cg, and 
which did not contain even two members which shared a common advance 
classification., The interpretation of these clusters would seem most 
problematic, but must be attempted. 

Several considerations were used in an attempt to arrive at an 
unambiguous meaningful interpretation of each cluster. The first and 
most obvious one was the advance classification of the items by Bloom's 
Taxonomy . 

Second, the possibility that some clusters ( for instance) 
might be content clusters could not be entirely discounted. 

Third , the possibility that items might have been misclassif ied 
in either of two possible ways, The aspect of the item which leads to 
the mi sclassif icati on might be related to some obvious but irrelevant 
format characteristic. This problem has already been illustrated by 
the case of items 19 to 2q- inclusive, -(see: Appendix B pp. 175-377) • 
Alternatively, there may be a discrepancy between the way in which the 
examiner intended that the item be interpreted and the way it was, in 
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fact, interpreted by the examinees * For instance, an item which in a 
comprehension item for some students may well be an analysis item for 
others. This latter possibility would suggest that the classification 
of items might be better after their characteristic clustering has been 
determined than before the test is given. 

FoTirth, since this study is postulating that the foils may have 
some effect upon the "interpretation of the item and, therefore, the way 
in which it is answered, the nature and selection ratio of the foils 
should also be taken into account when an attempt is being made to 
interpret the clusters „ In this respect, foils with a selection ratio 
of o05 or less were dropped from this and subsequent analyses, since 
these foils were selected by too few people for the statistics pertinent 
to these foils to be stable. 

Finally, there were other sources of information about these 
items, such as the interitem phi correlation coefficient matrix and the 
item consistency which might have proved useful in the attempt to make 
an unambiguous interpretation of each of the item clusters. 

In the discussion which follows each of the ten item clusters 
are dealt with in turn in an attempt to establish an unambiguous 
interpretation for each cluster. In advance of each discussion a table 
appears which supplies the following information: 

1. The numbers of items in the cluster, 

2. The subject matter content from which each item is drawn. 

3. The advance classification of each item. 

k. The biserial correlation (r ) of each item with the total 

test score. 
5. The difficulty of the items. 
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6. The selection ratio for each foil ana the advance 

classification of each foil for the items in the cluster. 
The foils which are dropped are also indicated. 

Any other information needed for the discussion is supplied in 
the context. Table k-2 follows on page 190. 

Three of the four items in Table 12 have their content clearly 
drawn from the Stupidity reading selection on pages hi and k2, and the 
fourth one, item 8, has a reference to this selection in its stem. 
However, item 8 can be answered without having read this selection since 
a good student should be able tc infer what is meant by the phrase 
"contempt for stupidity" from the context. Also, foil 28fj„ was the only 
part of item 28 which contained a reference to this selection but, once 
again, it should be possible to infer the meaning from the context. 
Furthermore, foil 28D^ is classified as an IA (invalid Assumption) foil, 
the invalid assumption for which can be arrived at without reference to 
the "Stupidity" selection. In addition, this cluster does not exhaust 
the items in which a reference to this selection is made. There are 
five other such items. For these reasons, it is not possible to 
interpret this cluster unambiguously on the basis of content. 

The relative magnitude (.378 to .^56) of the r, (Mserial 
correlations with total scores) varies sufficiently to suggest that they 
were probably not related to the statistical artifacts which caused 
these items to form a cluster. Also, since none of the difficulties or 
selection ratios (D ) are large enough that these items must (of 
necessity) overlap, the ratios cannot be considered to be relevant in 
this event either. 

Since throe of the items were classified in advance as analysis 
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items, whereas the fourth one (item 28) is Synthesis as this class was 
defined, the reasonableness of retaining the Analysis classification for 
this cluster is greater than for changing it to a content-oriented 
classification. Also, as has already been noted, item 28 has identical 
foils by class with item one. Furthermore , in three of these items 
(l, 8, 28) the most commonly selected foil has an advance foil classifi- 
cation of OG ( Overgeneralization ) . In this case, if 2D_^ could be 
reasonable reclassified from Substitution to OG an alternative to the 
reclassification procedure is to suggest that the categories of foil 
given above may not be independent. In this case, the common element to 
these items would be the common classification of the most commonly 
selected foil. This argument would be strongly supported if all of 
these foils fell into a common wrong answer cluster, which they did not 
do (28D 0 is the exception). It is reasonable to be reluctant to 
classify a cluster derived from the right answer correlation matrix of 
Group A on the basis of the performance of foils to the items when 
performance on specific foils is not part of the statistical basis from 
which this cluster is derived. In this cluster, the classification was 
(reluctantly) retained as Analysis, even though this meant reclassifying 
item 28. Table follows on page 192. 

To begin with, in Table there was no consistency between the 
items in Cluster C 0 with respect to their content (information back- 
ground) which might have accounted for the formation of this cluster. 
A similar statement can be made for the advance item classification, for 
the relative magnitude of the r b and the coefficients, and for the 
advance classification of the foils. 

Superficially, then, there would seem to be no basis for the 
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interpretation of this cluster. However, an examination of the foil 
classification for item 30 is revealing (see: pp. 181,182); and 301^ was 
an RT (Redefinition of Terms) .foil. Two and possibly all- three of these 
foils are related to Comprehension- type operations (i.e. they are 
Misreadi ng type foils). If the foils of an item could he eliminated by 
comprehension-type strategies, the fact that the stem-right-answer 
relationship involved a synthesis-type relationship may have been 
irrelevant. Similarly, if the stem-right answer relationship can be 
recognized by comprehension-type strategies without having to eliminate 
foils, an item may be a comprehension item with high level foils. Such 
a combination of arguments could account for item 3 and item 30 occurring 
in this group. In order to account for the presence of item 17 in this 
cluster, it is necessary to suggest that the CG, OS, etc. , type foils 
are related to analysis-tyoe strategies. For some individuals, then, 
item 3 could have been treated as an analysis item because of the high 
level of the foils. In this case, item 17 would have to have most of 
the examinees who selected the correct answer in the top one-third of 
the group as defined by total score correct, and there would have to be 
a high phi coefficient between items 3 and 17- The results support this 
contention. In the first place, about 80 per cent of the individuals 
who answered the item correctly were in the top 40 per cent of the 
group. In addition, the phi correlation coefficient between item 3 and 
item 17 is .281 which is significant at a probability level, of 
,02>p>.01. 

These results did not make possible the unambiguous 
interpretation of Cluster C ? . On the contrary, the interpretation would 
seem to be that this cluster involves multiple strategies, some at the 



In- 
comprehension level, and some at the analysis level. Hence C cannot 

"be defined in terms of a unitary category from Bloom's T axonomy , 

Once again, the data suggests right- wrong answer interactions. 
In addition, there appeared to be a multiple-strategy level involvement 
in the cluster» 

Cluster CV, Tab] e kk (see: p. 195) proved to be somewhat 
similar to G, in that the same phenomena occurred once again,, All the 
initial "bases for interpreting this cluster failed to provide for an 
unambiguous decision. In addition, the high lovel (Synthesis) item 
seemed to "be lowered by virtue of the low level foils. 

It may be reasonable to relate items 4 and 6 because of the fact 
that was reclassified a CM (Common Misconception) in the interpreta- 

tion of the wrong answer clusters and kl)^ became unclassif iable . It is 
possible that these are both low level foils which might lower the 
analysis classification of item k to comprehension. This argument 
concerning the reclassification of item A- must remain inconclusive since 
a classification of £*D was not established. 

The fact that these two clusters (C„ and C^) proved to be 
similar raises the question as to why they did not form a single 
cluster. Obviously, the items from one cluster did not correlate 
highly with the items from the other, but this fact does not add any 
information since it is this fact which is the statistical basis for 
the information of clusters „ A ] ook at the wrong answer clusters 
(indicated by W ) into which the foils fell proved illuminating. 

Table k$ (see: p. ] 96) shows that there is a degree of 
similarity within C of the wrong answer clusters (W^ occurs in two 
items). C has a high level of similarity among the foils (wi occurs in 
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D is for the correct answer. 
X means foil is dropped. 
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two items). C-j has a high level of similarity among the foils 
(W„ occurs in all three items and V/,. in two). There are no common 
wrong answer clusters "between the tv/o groups* This evidence would have 
"been much more conclusive as to why these clusters were distinct if the 
foil groups in were more strongly similar. Once again, however, 
there is some indication that toils may influence the formation of 
right answer clusters. 

In any event, the fact that there seems, once again, to be 
multiple strategies involved in this cluster means that it cannot be 
unambiguously classified „ 
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As Table k6 shows (see: p. 198), there is no clear "basis Tor 
the unambiguous classification of Cluster C, from any of the sources of 
data being used 0 This cluster, therefore, remained unclassified. 

On the other hand, item 5, in addition to involving a practical 
application, also involves "going beyond the given data to determine 
implications. o .which are in accordance with the conditions described 
in the 'original communication'" (Bloom's Taxonomy , 19.56, p„ 20. f ;). This 
could mean that the best Bloom's Taxonomy classification for this item, 
if the application aspect is ignored, is comprehension (2.30: 
Extrapolation). This item may, therefore, be capable of a dual 
classification. Item 19 is the only one in this series in which the 
correct answer involves the implication of the statement by the reading 
selection. Item Ik also involves extrapolation except that the extrap- 
olation Is from two selections rather than one, making this item 
(arbitrarily) a Synthesis item. It is too early in the development of 
this testing technique to be dogmatic on a post hoc basis about the 
interpolation on a s trategy basis , of any cluster. This statement is 
particularly reasonable since this clustering does not cross-validate 
(see: p. 9k) » However, it does suggest that a better definition of 
the multiple strategies which may be involved in the answering of 
multiple choice items for the types represented on this test might 
improve the effectiveness of the advance classification of the item, 
and most certainly would improve the interpretation of the clusters 
which were found to be peculiar to a particular group of examinees. 

In cluster CL, Table k7 (see: p. 199), the common element, on 
the basis of the decision rule that the cluster be identified by the 
most frequent process category from the advance class, would be 
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Analysis. Item ? was originally classified as an Application item 

because the stem asked for "the best example." In other items the 

lowering of the level of performance of that item by low level foils 

was observed. In this case the popularity of the analysis-related OS 

(7D-j) foil may have had the opposite effect, furthermore, foil 7D 

(an OS foil) was classified during the interpretation of the wrong 

answer clusters as NS (Non Sequitur) (see: p, 225 for definition) which 

was one of the three new foil categories which came out of these 

discussions. Since both of these categories (OS and NS) seemed to be 

more related to the logic of the item than to its semantics, these types 

of foil may help to define analysis type items. This possibility was 

strengthened by the fact that about 73 per cent of the examinees who 

chose 7 Do (the WW foil) were in the bottom 60 per cent of the group, 
i 

whereas about 65 par cent of the examinees who chose 7D„ (the OS foil) 
were in the top 60 per cent of the group. These figures suggested a 
moderate but definite trend on the part of these foils to move the 
performance of this item upward in level. The same trend is evident in 
the average total-correct values for each of these foils and for the 
right answer. Those who chose 7D 0 had an average total correct score 
of 11.6, while those who chose 7D had an average total correct score 
of 12.2, and those who chose the correct answer had an average total 
correct score of 13«7. The average score on the entire test was 12.2. 

The basis for the upgrading of item 7 to the Analysis level is 
somewhat tenuous. The fact that it would otherwise be the only anomaly 
in this cluster strengthens the use of the decision rule. The use of 
the rule is further strengthened by the fact that none of the possible 
bases being used for interpretation give a more reasonable explanation 
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for this cluster. The Analysis classification of this cluster was 
retained . 

Cluster C r , as summarised on Table '48 on page 262, was 
classified as Analysis since both its members were so classified in 
advance. However, the procedures being used suggest at least two other 
bases for interpreting this cluster. First, the difficulties (D 
selection ratios) were high. Second, the most commonly selected foils 
fell into the same wrong answer cluster (W^,) . These two events could 
be related to each other since the advance classification of the foils 
with their respective items are different. 

In any event. was treated as an Analysis cluster in 
subsequent statistical analysis. 

In Table if 9 on page 203 none of the bases being used for 
interpretation assist in the explanation of the formation of cluster C„ 
except the arbitary rule that the majority of the items shared a 
common classification in advance. In both these items (item 10 and 
Item 16) a value judgment is specifically asked for in the stem and 
explicitly included in each of the alternatives. The other item 
(item 18) which had these same characteristics fell into another 
cluster. Item 29 asks for "the most important consideration" in the 
stem which makes the stem contain an explicit request for a value 
judgment; however, this valiie judgment is not explicit in the alter- 
natives. The term "likely" occurs in the stem of Item 11 suggesting, 
perhaps, an implicit value judgment may be involved in this stem. 
Should the definition of the Evaluation level of items as used in this 
study be extended to include implicit as well as explicit value 
judgments? The results of the cross-validation part of the study 
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suggested thai Group B responded quite differently to these items. 

In Appendix B A/here Item 26 from Cluster C Q in Table 50 

o 

(see: p, 205) was discussed (see: pp. 177,178) it was pointed out that 
an inductive structure had to be generated in the mind of the examinee 
in order to answer this question, led to its classification as a 
Synthesis item. All other Synthesis items had the additional 
characteristic of involving more than one reading selection. The use 
of the device of having more than one reading selection as a basis for 
Synthesis items did not generate a unique cluster. On the other hand, 
if the logic of Item 12 and Item 20 are examined, it becomes evident 
that a similar process of reasoning to Item 26 may have been involved 
in these two items as well. 

The classification of Item 20, as Analysis was made because it 
is clearly structured so as to involve a hypothesis testing procedure. 
If each of the alternatives in Item 12 and Item 26 were also regarded as 
hypotheses which were to be tested, against the inductive structure 
employed by the examinees in Group A, the answering of these items may 
have been relatively homogeneous. 

This cluster (Cg) is classified as Synthesis on the grounds used 
for the advance classification of Item 26, or it could be Analysis on 
the grounds used for the classification of Item 20. However, in 
multiple choice format it is impossible to avoid hypothesis testing 
aspects of a Synthesis item when a specific set of alternatives is given 
in the itenu The type of item in this cluster came about as close to a 
Synthesis level as it may be possible to come in multiple choice items. 
Bloom (1956) suggests that if ambiguity of classification occurs, it 
should be resolved in favour of classifying to the highest possible 
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level. For this reason, on the basis of the performance of these 

items (at least within Group A) the items in this cluster have been 

reclassified as Synthesis items. 

Both ■ Cluster' C items, as summarized in Table 51 (see: p. 207) , 

are from the same reading selection. Also, the Procrustes rotation to 

content suggested a possible content basis for this cluster. However, 

both of them also proved to be very poor items on the basis of the 

combination of their r and difficulty (D^ selection ratios) „ In 

addition to this their most common foils occurred in the same wrong 

answer cluster (W,-,), and the I) selection ratios of these two foils were 
7 n 

very high. It is probable that these two items form a cluster, at least 

partly, on the basis of the relationship between these two foils. This 

cluster remained unclassified in subsequent statistical analysis. 

There would seem to be two possible bases for the interpretation 

of Cluster C as summarized in Table 52 (see: p. 208) content and 
10 * 

advance classification. It is possible that both of these factors were 

operative to make this cluster distinct from other Analysis clusters. 

Some support for this argument may be found in the fact that C^q was 

positively correlated with all the other classified clusters in the 

Procrustes rotation analysis (see: Table 10, p, 69) except C,,., although 

all of these correlations may have been too small to have much meaning. 

In summary, then, this interpretation attempt upheld the 

Analysis level classification of Clusters C^ , 0 , , and C^ ^ as they 

emerged in the comparison between the results of the minimal interpoint 

distance cluster analysis and the advance classification of the items. 

It similarly upheld the classification of C n as Evaluation The 

1 

procedure led to the reclassification of one cluster (C„) from 
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TABLE 52 



SUMMARY OF DATA FOR CLUSTER C 
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Item Item Advance r.. !! D„ J ', D-, D„ D„ \\ D n D„ D_ 

0 i i * J ! 1 2 > ii 1 c J 

No. Content Classification M J j n 

21 Progress Analysis AQk ';!o32|j .0? ,3? .23 0 0 0 

25 Progress Analysis .329 !!.55ii o2k .19 .02 |j Sub Irr X 

2h Progress Analysis .317 J { - B6 * J .01 .0? .06 |! X 0 0 



is for the correct answer. 
X means foil is dropped. 



209 

undetermined to Synthesis, In the remaining clusters (c.^, C , , and 

Cq) it was impossible to provide an unambiguous basis for classifying 

these clusters from the available data,, Hence these clusters remained 

unclassified,. Clusters C 0 and C 0 seemed to involve some form of multiple 

strategies and seemed to be a Comprehension level cluster for Group A 

if the superficial characteristics of the items which led to their advanc 

classification .were ignored. Since this cluster did not reappear in Grou 

B, multiple strategies between groups of examinees may be involved,, 

Only in the case of C„ can content be said to be a more 

9 

reasonable interpretation of these clusters than some form of single or 
multiple strategy „ Even in this ease, the content interpretation is in 
some doubt p suggesting that this test is essentially "process oriented," 
as intended o 

"^i'oiu o j^i jT 1 "^ "'t.llI 3^ it-cms on "tlic "t^s"t 3 2 ^ "tcr^s ^^j'O co vi ~ty 

formed at least pairs in the clusters which emerged. This "interpret- 
ability" figure was improved to 19 items (63 per cent) on the basis of 
the interpretation procedure used„ 

The Meaningful In t e rp r e t a t i on of Wrong Answer Clusters 

A similar interpretative procedure was used for wrong answers 
as for the items 0 The considerations were taken in the following order, 
l) advance classification, 2) information background (content), 3) 
statistics of foils, d) logi co-semantic analysis. A cluster was again 
assumed to be identified by the most frequently recurring advance 
classification o Foils of the 0 (Other) category were assumed to be part 
of this common classification. 

Of these four, the only one related to the logico- semantic 
characteristics of the foils was their' advance classification. If these 
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clusters could not be accounted for in other ways , and the logi co- 
semantic characteristics of the foils can he shown as a possible basis 
of cluster membership , then this latter basis ma;/ be the best available 
interpretation. It has already been she™ that the advance classifi- 
cation of both right answers and foils did not survive very well in the 
cluster analysis. It has also beer! shown that this classification can 
be improved by examining the clutters for seme relatively unambiguous 
basis for interpretation. In the case of foils. this involved a re- 
examination of their logi co- semantic structure. The five bases used in 
an attempt to interpret the wrong answer clusters were, once again: 

1. The advance classification of the foil. 

2. The content of the foil. 

3. The selection ratio of each foil. 

s. The relationship between right answer and wrong answer- 
clusters . 

5. The reconsideration of the logico- semantic structure of the 
foils . 

In some cases, material from other sources such as Powell and 
Isbister (1969) were used to assist in this interpretative process. 
Finally, one cluster (whj was completely lost by virtue of the low 
selection ratio among all of its members. In addition, two other 
clusters were reduced to single members by this procedure. In the 
discussions which follow some tentative attempts are made to account for 
the "special cases" as well as the "general trends" in each cluster. 

For the convenience of the reader, whenever a particular foil is 
being discussed for the purpose of reclassification, the stem of the item 
and the particular- foil are both given ahead of the pertinent discussions. 
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Table 53 on this page gives information for the .interpretation 
of wrong answer cluster W - 

TABLE 53 

INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W, 



Foil 


Content 


Ad van ce 
Cla s s i f i oa t i on 


D S 
n 


1D 1 


Stupidity 


0G b 


.83 


2D 


Stupidity 


Sub 


.61 


22D„ 


Progress 


0 


.84 


3 








8D 1 


Aggression 


OG 


• 58 


17B 1 


Di scipline 


OC 


.80 









a. The symbol I) refers tc the selection ratio. 

n 

b. Foils which had appeared in a common category in the advance foil 
classif icati on. 



The content from which these foils in wronp; answer cluster V/, 

1 

are drawn comes from a broad spectrum of the test ruling out content as 
a possible basis for the interpretation of this wrong answer cluster. 

The ph .i coefficients upon which this cluster is based are 
dependent upon the size of the overlap between particular pairs and upon 
the marginal totals „ When the selection ratios for two alternatives 
both exceed .50 "there is tendency for the range of phi to be shifted 
positively. In this case, all cf the selection ratios in the cluster 
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exceed oC and for this reason the sizes of the selection ratios could 
be a contributing factor to the statistical formation of this cluster. 
If this event were the only factor, however, it would bo reasonable to 
have expected more of the other six foils which have selection ratios of 
greater than .50 in this cluster, or in a limited number of other 
clusters. In fact, they occurred in four of the clusters. 

Of the five foils in this cluster three of them were in items 
which occur in a common right answer cluster (0 ) . This finding was 
suggestive, once again, of a right -answer wrong-answer interpretation, 
but insufficient to lead to an unambiguous interpretation of the present 
cluster under discussion 0 

Also, three of the five foils in Table ?M were classified as OG 

( Overgene realization , i.e. 1D^, 81) , 1?D^) and a fourth one as 0 (Other, 

i.e. 22D^) which means it could be treated as an OG as well. Thus, the 

advance classif i ca.ti.on seemed to be the most promising basis for 

interpretation, making it necessary to re-examine the classification of 

2h^ for its logi co-semanti c relationship with the stem. 

Item 2 : Which of the following factors is the most important 
causitive factor of contempt for- stupidity? 

2D^ Compulsory school attendance .( Sub ) 

Probably the best procedure in the analysis of this foil is to 
use a Venn Diagram, as given in Figure 3, page 21 3 « 

Figure 3 shows clearly that "contempt for stupidity" and 
"compulsory school attendence" (Foil 2D,) are disjunctively related. 
For a factor.' to be "causitive" it must be either conjunctively or 
implicatively related to the other factor. Either conjunction or 
implication is a necessary but not a sufficient condition for causation. 
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It would "be reasonable to suggest that the student who replaced this 
disjunctive relationahip v/ith a conjunctive relationship, (ignoring the 
fact that either can occur without the other) is substituting one 
category for another. It was upon this basis that this foil (2D., ) was 
originally classified as a Sub (Substitution). 

On the other hand, if the student considers the relationship as 
implicative, (i.e. compulsory school attendance can occur without 
contempt for stupidity but not vice vers a) this interpretation could be 
considered an OS (Oversimplification). The thinking in this ca.se 
Involves proceeding from an entire set to a subset of that entire set 
as indicated in the use of Arrow //l in Figure J. 

If the student begins with the conjunctive subset and extends 
this to include all cases of contempt for stupidity, the thinking 
process would follow the path of Arrow §-2 in Figure 3« In such a case, 
(proceeding from a subset to an entire set) the most reasonable 
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interpretation of the thinking is 00 (Overgeneralisati-on) . These 
arguments suggest the importance of the way in which an item is 
interpreted with .respect to the way in which a foil should be classified. 
As just shown, this foil (2D ) can be reasonably argued to have at least 
three possible classifications depending upon the interpretation placed 
upon the foil by the examinee. 

A decision rule was needed to deal with foils which might 
reasonably be classified in several different categories. Where a 
cluster seemed, in general, to reflect one category of foil, and the 
same interpretation was one of the possible classifications of the 
ambiguous foil, then this classification was assumed to be an appropriate 
interpretation of the ambiguous foil for the particular group on which 
the cluster analysis was conducted. The most important characteristic 
of this rule was the requirement that the interpretation of a foil 
should probably not be generalized beyond the group of examinees upon 
whom the interpretation was established. 

Hence, since 2D^ could be an OG, this wrong answer cluster could 
be interpreted as an OG cluster for Group A. 

Wrong answer cluster W^ was eliminated before logico-semantic 
analysis on the basis of the fact that all of the foils in this cluster 
had a selection ratio of less than o 06. 

For Cluster W (see: p. 215) there is no common content. Items 
k, 13, and 6 comprise all the items in right answer cluster C but these 
items represented less than half of the members of W_. Only one of the 
foils has a selection ratio of more than .50; and both OG and CM have 
two representatives from the advance classification of foils. Hence, the 
interpretation of this cluster could not be established by any of these 
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TABLE 5'i- 





INFORMATION FOR THE 


INTERPRETA TI ON 






OF WRONG ANSWER 


CLUSTER V/_ 










Advance 


D 

n 


TPm" 1 
r Ull 


e/~\vi ~r T 






Stupidity 


OG 


.10 


26d 1 


Progress 


0 


.15 


52„ 


Awareness 


CM 


.61 


c 


Awareness 


Sub 


= 22 




Stupifity, Awareness 


OG 




13D ? 


Awareness 


CM 




29D 9 


Discipline, Progress 


OS 


• 15 



methods . 

In the ensuing discussions a possible common element among the 
foils in a cluster is suggested. The clue which was used in this case 
was the presence oi' OG, OS, and CM (Common Misconception) in the same 
cluster of which OG and CM were the most frequent. Powell and Isbister 
(1969) found a polarity between OG and 03 on the one hand, and CM on the 
other. This polarity in a factor matrix indicates a significant negative 
correlation between the poles. Since this polarity, at least in part, 
could have been an artifact of the mutually exclusive selection of 
responses within items, it is reasonable to suggest that OS, OG, and CM 
foils may be related. In addition to using the "most frequent" rule, it 
might be possible to reinforce the interpretability of this cluster as 
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CM if most of the remaining foils had CM as one of their possible 
alternative classif i cat ions . 

To begin with, foil 26~D as a member of the 0 category can have 
any interpretation which is reasonable for the remainder of the foils. 

A discussion of the others follows: 

Item The author, in charging that "society teaches contempt 

for stupidity and fear of being regarded as stupid" by 
means of the school, is assuming that: 

■ The school should not be an enforcing arm for the customs 
of society. (OG) 

This foil is wrong on two counts. In the first place it is a 

conclusion rather than an assumption made by the author,, Second, it is 

overstated by containing a value judgment which may be unwarranted. 

This second reason led to its OG classification. It is, however, in 

addition, one of the ways of phrasing a very common argument against the 

establishment of parochial schools. In this latter 1 context it could be 

a CM foil as well. However, this argument is shaky at best, and would 

not be likely to extend beyond the context of the group upon which this 

cluster was established. 

Item 29 : On the basis of the foregoing which of the following 
seems to be the most important consideration when 
preparing anecdotal records on progress reports? 

29D_ Make no attempts at interpretation since your judgments 
are probably biased. ( CM) 

29I>2 Be as brief as possible, giving no information which may 
cloud the central problem,, (OS) 

Foil 29D_ L was dropped on the basis of its low selection ratio. 

Since in the advance classification two foils in the same class in the 

same item were not entertained, and since the "brief as possible" part 

of 29D„ is an Oversimplification, foil 29D v/as originally classified 



as an OS foil. However, the phrase "which may cloud the central problem 

in foil 29D 2 is similar in its central idea to "your judgments are 

probably biased." This idea being another way of saying that a person 

should be as objective as possible „ Students who emphasized this idea 

in their thinking could be responding more to the second part of the 

statement than to the first (OS) part, making CM (Common Misconception) 

a reasonable alternative classif iealion for this foil. Of course, there 

is the problem of the "relevancy of details" which this foil may also 

raise. It is, however, better to put in details which the writer may 

think irrelevant and an independent observer may not than to require the 

independent observer to infer these details from the context because of 

their omissions. This aspect of the discussion leads to another common 

misconception, namely that the simple act of speaking or writing has 

produced a successful communication. 

Item 6 : The purpose of developing a generalized intellectual 
awareness is to: 

6D Stimulate thinking ability within the individual's chosen 
field. 

The confining of thinking ability to "the individual's chosen 

field" is false. It is a substitution for the phrase "which is not 

contextually bound." Once again, however, the frequency with which this 

misconception is encountered suggests that the foil could be regarded as 

a CM foil as well as a Sub foil. 

Item 1*4 : Which of the following best describes the probable 
relationship between contempt for stupidity and 
generalized intellectual awareness? 

IkD Contempt for stupidity should be reduced and awareness 
should be increased. (OG) 

This foil is clearly an 00 foil since it adds an unwarranted 



value judgment to an otherwise correct statement. Should the 
definition of CM foils be extended bo incorporate this characteristic? 
In any event, six of the seven foils in this cluster could be assigned 
fairly reasonably to the CM class, making the CM interpretation of the 
entire cluster plausible, if not reasonable. Tabic 55 on Cluster WY 
follows . 

TABLE 55 

INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W, 





Foil 


Content 


Advance 
Classification 


D 

n 


30D 3 


Summary of 
all passages 


RT 


.10 





W^, as a single member cluster, should probably be dropped 
unless a good reason for retaining it can be found. As an approach to 
the problem of classifying this Cluster W, the fate of other RT foils 
proved helpful „ Foil 11 was originally in this same wrong answer 
cluster (W^) but was dropped because of its low selection ratio. Foils 
13Do and 15B, were both in wrong answer cluster W„, and this cluster 
remained uninterpreted. The separating factor between 15D~, 29D^, and 
30D^ may have been on content lines since each of these are from 
different reading selections. Foil 29IU was also dropped because of its 



lew selection ratio. .Foil 131^ had similar but not identical background 

to 30D,, but was dissimilar to l r il)„ as to content, which was part of the 
3 j 

classification problem of W r , . The other single member cluster (W, _) 

contains foil 29D-, which was originally classified as CM but could also 

be an RTc, If the foil in W n - is an RT, the separation is on a content 

10 ' 

basis. The low average total correct scores for these two foils ("11. 1 
and 11. 0) may be taken as equivalent except for content; hence, this 
cluster (W, ) was retained as an RT. Table ')(> on Cluster W r follows. 



TABLE 56 

INFORMATION FOR THE INTERPRETATION 

OF WRONG ANSWER CLUSTER W„ 

0 



Foil 


Content 


Advance 
Class i f i cation 


D 

n 


121)^ 


Aggression 


Inv 


olO 


11D 2 


Aggre s s i on 


OS 


.30 


12D 2 


Aggressi on 


OG 


.35 


19J) 2 


Progre s s 


0 


.Ik 


20 D 


Progress 


0 


o37 



By the rules used thus far, except for the logi co-semantic 
interpretation, Cluster W e should be classified as 0 (Other) since this 
is the most frequently occurring equivalent advance foil class. The 
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unusual element in this cluster, however, ie the fact that there are two 

foils from the same item in this cluster. This event violates the 

assumption behind the rule for classification discussed earlier on 

page 153, number 5. 

Since the reclassification of any one of foils 12D OJ 11D„, or 

12D„ to the same category as either of the others would give that 

classification to four out of the five foils in this cluster, and since 

content, the size of the D , and the distribution in right answer 

n 

clusters of the corresponding items do not account for this cluster, a 
logico-semantic analysis of the two foils in item 12 would seem 
reasonable „ 

Item 12 : Aggressive behavior in female children is: 

12LV, More unpredictable and expressed differently than in males 
of the same age group. (OG) 

12D„ Less differentiated in expression than in males of the 
same age. (inv) 

Foil 12J)„ was classified as Inv (inversion) because it is 
3 v ' 

opposite to a true statement. It is not opposite to the actual correct 
answer but to a statement that could have been used as an alternative 
correct answer if the examiner had so chosen. On the other hand, ISD^ 
was classified as OG because the first two words (more unpredictable) 
form an incorrect statement added to a correct statement. However, 
these two words are incorrect by virtue of being opposite to the truth 
(inv) when the restriction "of the same age group" was applied to this 
statement . 

This last property reinforces the Inv (inversion) classification 
of this foil. It may be argued that two Inv foils were possible in 
Item 12 because of the complex logical structure which this item 



required before it could be answered . (See: the discussion of right 
answer cluster Cg on pages 118 to 120). 

Ttein 11 : Overt aggression would likely be decreased by: 

11D^ Blocking many modes of Aggression. (inv) 

11D„ Lessening the threat cf punishment. (OS) 

Lessening the threat of punishment, or permissiveness, does not, 
by itself, either increase or decrease overt aggression. It is on these 
grounds that this foil was classified as an OS. Overt aggression will 
be likely to increase if permissiveness develops frustration. ' The 
energy of the children must be channelled into alternative directions if 
overt aggression is to decrease in a permissive setting. Thus, it is an 
oversimplification to say that aggression is likely to either increase or 
decrease in a permissive setting. However, lessening the threat of 
punishment, in the absence of alternatives, will probably increase overt 
aggression. This foil could have been classified as an Inv if it were 
not for the fact that 11D^ was already so classified. An alternative 
possible classification for 11D is given with the .discussion of W 
(see: p. 227). 

Perhaps the reclassification of ULL as Inv is a bit tenuous. 

The other four foils in this cluster are not on such shaky ground, so 

that it is reasonable to reclassify wrong answer cluster W as Inv. 

5 

Information for the interpretation of Wrong Answer Cluster W, appears 

D 

in Table 57 on page 222. 

Short of a close logi co-semantic analysis of the characteristics 
of the foils in Table 57? there is no clear basis for the interpretation 
of cluster W^ . 

It should be pointed out that the fact that these foils have 



222 



TABLE 57 



INFORMATION FOR THE INTERPRETATION 

OF WRONG ANSWER CLUSTER W, 

o 





Foil 


Content 


Advance 
Classification n 


Z5\ 

21D 
26D 3 


Aggression 
Progress 
Progress 
Awareness 
Progress 


Inv .11 

Sub . ?A 
0 .23 
OS .51 

Irr .13 



formed a cluster has led to the assumption that there must have been a 

common logico-semantic element in these foils, at least so far as the 

members of Group A were concerned. The argument for the formation of a 

new class of foil (namely: Non Sequitur--NS) which follows should not 

be construed to deny the plausibility of the original classification of 

the foils in this cluster but only as to the inappropriateness of these 

classifications for this particular group of examinees. By this point 

in the study, it had become clear that the foil categories were not 

mutually exclusive, and that it seemed possible to classify at least 

some foils in several different ways (see: 2D , p. 131 ff). Thus, the 

foil interpretations presented here most probably apply only to Group A. 

Item 9: The basic position of the author in writing about 
aggression is that it: 

9D Can be eliminated through the process of socialization. 
(Inv) 



On the contrary, FCagan and Moss (1962) assume that aggression is 
one of several innate behavior systems. Being: innate makes its elimina- 
tion impossible. However, the socialisation process can channel aggres- 
sion away from its more destructive aspects. A number of classifi- 
cations of this foil are possible. Once again, these classifications 
are dependent upon several possible logi co-semantic distinctions v/hich 
can be made. Most simply, the foil does not follow from the data, i.e. 
it is a non sequitur relationship. There was, however, no such class if i 
cation in the advance classification. This N3 ( ncn sequi tar ) category 
was eliminated, as indicated above, on the basis that it displayed 
experimental dependencies with CM foils in the Powell and Isbister 
(1969) study. 

Item 25 t The most useful suggestion to help Chester is: 

291)-^ To give Chester personal warmth, acceptance, and support 
wherever it is appropriate. (Sub) 

As the right answer indicates, Chester needs "concrete help in 
getting started on specific tasks." In other words his problems would 
seem, from the progress report, more developmental than emotional. The 
procedure given in 25D-^ substitutes a treatment procedure designed to 
deal with emotional problems for a procedure designed for developmental 
problems. By itself, the use of Z^D is simply inappropriate. Re- 
classifying 2'5D-^ into a new WS category would seem to be reasonable. 

I tern 21 : Chester is growing very slowly and really quite 

immature for his grade. Everyone expects too much of 
him . 

21D This hypothesis is refuted bv the facts. (0) 
On the contrary, the only information directly available about 
Chester's physical development is the number of days he has been absent 



or tardy. This limited information is insufficient concerning the 
physical areas of development given in Item 21 to form any conclusions. 
He shows some indications of specific academic immaturity, and perhaps 
some social immaturity, but this is all. As to whether or not the 
expectations made of him are unreasonable, it cannot be decided from the 
information given* The only expectation statement made about his work 
is "Is not ready for division," which does not sound like a statement of 
ov er exp g ctation . 

Of course, 21D^ could have been reclassified without this 
analysis because of its 0 (Other) advance classification. However, 
its relationship to the correct answer supported the use of an NS 
category for this foilo 21D was dropped for underselection, and 2 ID., 

c~ J. 

formed part of one of the two unnamed new categories to emerge from this 

study. (W^). This information a] so implies the reasonableness of the 

formation of a new N3 category for wrong answer cluster Wg. 

Item 7 : Of the f ollowing the best example of generalized 
intellectual skill is: 

7D^ Applying abstract principles to new situations. (OS) 

Comparing 7D^ with the "correct" answer "The widely applicable 

technique of logic" showed this foil to be without question an OS foil, 

since it is a. true statement of narrower generality than the correct 

answer. As a true statement it cannot be classified as an NS, which 

leaves the interpretation of this cluster based on this item in some 

doubt. Any alternative interpretations such as involving its large 

selection ratio (.51) or suggesting a tenuous link between NS ( Hon 

Sequitur ) and OS (Oversimplification) would be premature at this 'stage. 

Cross validation data helps to clarify this cluster somewhat „ The fact 
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that this foil moves to wrong answer cluster VL in Group B which is most 

prorainantly composed of members from W (the wrong answer cluster from 

Group A which proved to be unelassif table ) relieves the problem of 

interpreting this cluster somewhat, but does not solve it. Foils 9D 

and 25D-^ moved together, as did 21D^ and ?D^, while 26D^ migrated by 

itself to a new cluster. This cluster held together better than any other 

in cross validation with the exception of W,~. 

Item 26 : If additional information on Chester is desired and 
none of the following has been attempted, which one 
would provide the greatest amount of immediately 
useful information: 

26D q A reauest for the assistance of a guidance counselor. 
3 (Irrj 

The D (selection ratio) on this foil is .13. It might be 
n v 

interesting to know who made these selections. Most of the examinees 
in this group were practicing teachers and they would knew from experi- 
ence that the usual information from the guidance counselor would merely 
reinforce what they already knew and not add much further useful 
information. In the absence of an N3 category, this foil is clearly an 
Irr. 

Of the five foils in this cluster, four could have been 
classified quite reasonably as NS foils. The fifth one (7D~) could not 
have been so classified meaning that this cluster could not be unambig- 
uously classified. however, the best interpretation for this cluster 
within Group A is to establish a new class of foil, namely: Non 

Sequitur (NS) : This type of foil is a wrong ansv.'er foil by virtue of 
the fact that it simply does not follow from the given 
information. As a rule some part of the foil stands in 
direct contradiction to the logical structure or 
cennotative moaning of the background information (or 
part thereof) required to answer the question 



Wrong answer cluster waj classified as containing NS foils, 

at least so far as Group A was concerned, fable 58 giving- information 

for the interpretation of wrong answer Cluster W„ follows. 

t 

TABLE 58 

INFORMATION FOR THE IKTEiiPIfflTA'flON 
OF WRONG ANSWER CLUSTER VL 
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Aggre s s i on 


CM 
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Progress 
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A7 
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Stupi dity 
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Stupidity, Awareness 
Aggre s s i on 


RT 






12D 1 


Aggression 


CM 




.35 


15D 3 


Discipline 


RT 




.88 


18D ? 
26L 2 


Discipline 
Progress 


Irr 
OS 




,86 
o3k 


8D 3 


Aggression 


Inv 




.12 


The 


information in Table 58 provides 


no clear basis 


upon whi ch 


to interpret 


wrong answer cluster V/.,. It con 

( 


ta Ins 


two CM's 


, two RT's, 


and two Irr ' 


s. In general, the lower D 1 s re 

n 


late 


to the selection on 



aggression (26D 0 is the exception). 

No detailed discussion will be given for this group. One 
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illustration is sufficient. Foils 10]) and 12J) could conceivably be 
reclassified as RT foils on the basis of logi co-semantic analysis, as 
could 19V by the "0" rule. However, such a reclassification is un- 
reasonable for foils ifl , 18D , 26D , and 81 . Similar findings 

3 (~* (~. J) 

occurred for other pairs in this wrong answer cluster. 

The inability to interpret this cluster led to its being dropped 
from further relevant analysis. This was somewhat unfortunate since 
several of these foils have a high selection ratio meaning that a fair 
amount of information was lost by this decision. Nonetheless, it is 
reasonable to argue that if a wrong answer cluster cannot be given an 
adequate label, it should not be used. Table 59 giving information for 
the interpretation of wrong answer cluster V/., follows. 



TABLE 59 

INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W„ 





Foil 


Content 


Advance 
C 1 a s s i f i c a t i o n 


D 

n 



11D Aggression Inv .20 

28D„ Stupidity, Progress IA «06 

1A-D Stupidity, Awareness Sub .10 



Table 59 shows that Cluster V Q also seems to be ambiguous. 



o n Q 

Item II: Overt aggression would likely "be decreased by: 

11IL Blocking of many modes of aggression. (Inv) 

There are several ways in which this foil can he interpreted. 

The blocking of modes of aggression would not reduce overt aggression 

except in those specific areas where the blocking occurred. Overt 

aggression would increase in areas where the blocking was absent or less 

effective, or more socially acceptable to the peer group. If the 

blocking increased the frustration level, the absolute incidence or overt 

aggression would also increase. For this latter reason, (i.e. since 

blocking would probably have the opposite to the desired effect) this 

foil was classified as an Inv. However, to arrive at this conclusion as 

being correct the examinee must make the invalid assumption that attempts 

to regulate overt behavior also regulate innate drives. This foil could, 

therefore, be classified as an IA foil. This decision gave the cluster 

a majority of IA foils. If IA was also a reasonable alternative for 

14D 0 , then IA would be a reasonable classification for this cluster so 

far as Group A was concerned. 

Item 1^ : Which of the following best describes the probable 
relationship between contempt for stupidity and 
generalized intellectual awareness? 

l^D Either will increase with an Increase in the other. (Sub) 

Foil l^D^ was one of the foils about which the raters showed 

considerable disagreement. It was classified as Sub because part of the 

relationship was wrong, and it did not seem to relate to the other Inv 

foils in the pilot study. This relationship could also be an Inv, 

because a direct relationship is logically opposite to an inverse 

relationship. Powell and Isbister (19&9) encountered the same problem 

in logical relations (as compared to logical fallacies) type foils. On 
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the other hand, this -"oil could wrongly bo considered correct if the 

examinee forms an invalid assumption relating "critical" thinking with 
contempt for stupidity by defining stupidity in terms of uncritical 
thinking*, 

Other possible classifications were given to this foil with 
similarly tangled arguments, .justifying each rater's conclusions. 

Two of these foils can be reasonably regarded as IA foils, the 
third could well be stretching the point. In any case, a reasonable 
over-all classification f or this wrong answer cluster would seem to be 
IA. Table 60 giving information for the interpretation of wrong answer 
cluster w\ follows. 



TABLE 60 

INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W„ 



Advance 

Foil Content Classification n 

3D Stupidity 03 Al 

28D Progress Irr .25 

lOD^ Aggression Inv .06 

17D^ Discipline Sub .09 

271) 2 Progress 0 .06 

5D„ Awareness OS .12 
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In Table 60 the two OS's and the 0 foil account for half of the 

foils in Cluster W Q making an OS classification of this cluster within 

the rules cased on the advance classification. .It would be better if the 

other three could be alternatively classified as OS- 

Item 26 : In Chester's progress report, which one of the 

following is the most important factor contributing to 
his difficulty with school achievement? 

28D His ability to develop a generalized intellectual 

1 ^ ,' T . \ 

awa X'Ciiij s ij . ^ ( j. r_L' j 

This foil was classified as Irr on the basis that in Grade 6 most 

children are still too young to have progressed very far into "Formal 

Operations" which form the basis of generalized intellectual awareness. 

This classification assumes that the examinees know this information 

about development which is an unreasonable assumption. In the absence 

of this information, Chester's problems also involve decoding skills in 

reading and to a lesser extent in arithmetic. .Hence his problems involve 

mere than this foil suggests, and the foil might, for this reason, be 

reclassified as an OS foil. 

Item 10 : With which of the following statements concerning 

aggression would the author be most likely to agree? 

lOD^ Aggression generally interferes with the attainment of 
educational goals. (Inv) 

This foil was classified as an Inv because the best answer was 
"Aggression is potentially useful for educational purposes." Superfi- 
cially 10I>2 would seem to be opposite to the right answer. On the other 
hand, aggression can interfere with the educational process. Aggression 
expressed in the form of competition may be a useful form of intrinsic 
motivation. The term "generally" in the foil overstates the case for 
the negative aspects of aggression whereas the foil itself understates 
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the total picture. This foil might be olassified as either OS or OG 

depending on how the exam:'. ne-e locks at the item. 

Item 17 : From the above passage we can infer that Boris' 
leadership of the group was : 

17D^ Laisser.-faire . (Sub) 

The reason for classifying this foil as Sub was that .Doris 1 
attempts to coerce were ineffective, so she let the teacher take over. 
As a result her later leadership was laissez-faire but only under the 
arbitrary intervention of the teacher. As pointed out in the original 
discussion of this item, even this argument is stretching the point. 
(See: pp. 172,173) • There is no similar plausible argument which might 
make this foil an 0S„ 

Five of the six foils in this cluster can be included in the OS 
categories hence the advance classification of this wrong answer cluster 
is retained. Information for the interpretation of wrong answer cluster 
follows in Table 6l. 

TABLE 61 

INFORMATION FOR THE INTERPRETATION 

OF WRONG ANSWER CLUSTER W, _ 

lu 





Foil 


Content 


Advance 
Classif i cation 


D 

n 




Discipline, Progress 


CM 


.08 
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Cluster W . as given in Table 61 is the second of two single 
member clusters. It contains only 29D^ which was originally classified 
as a CM for the reasons already given (see: pp. 180,181) , However, the 
fact that it did not occur in a common cluster with the other CM foils 
raises some doubts over this classification within the confines of 
Group A. It is possible, however, to classify this foil into other 
categories . 

Item 29 : On the basis of the foregoing which of the following 
seems to be the most important consideration when 
preparing anecdotal records or progress reports? 

29D Make no attempts at interpretation since your judgments 
are probably biased. (CM) 

The suggestion in this foil that all interpretations are 
sufficiently biased so as to be of little value has the effect of re- 
defining the term "interpretation" to mean "biased interpreta.ti on . " 

In this case, it would be necessary to assume that the other 
single member group (W^_) which is also an RT (Redefinition of Terms) 
split from this one along content lines. The obvious vocabulary- content 
linkage in both of these foils makes this conclusion reasonable. Foil 
291)-^ was reclassified, therefore, as RT (Redefinition of Terms) making 
Cluster W ^ an RT cluster. The apparent content binding of some mis- 
reading type foils would seem reasonable, since there seems to be a 
parallel group of foil levels to the right answer levels in Bloom's 
Taxonomy . Bloom's description of the levels in the Taxonomy suggest a 
steady progression away from context as the level of the categories 
increase. For this reason, it seemed reasonable to combine W with 
W, as an RT cluster in subsequent analysis, rather than to discard both 
clusters because of their small size. 



Table 62 gives information for the interpretation of wrong 
answer cluster WL . „ 



TABLE 62 





INFORMATION 5 


'OS THE 


INTERPRETATION 






0.F WRONG 


ANSWER 


CLUSTER W n 
j-1 






Foil 


Content 




Advance 
Classification 


D 

n 



18D 1 


Discipline 


Sub 


,11 


?D 2 


Awareness 


WW 


.31 


6D 3 


Awareness 


Irr 


.23 


16D 3 


Discipline 


CM 


,1k 


2D 2 


Stup i d i ty 


r\o 


.09 


lOD^ 


Aggression 


WW 


o55 



In Table 62 the most frequent foil category in Cluster W v/as 
WW (Word -Word Link) which serves by the decision rules to identify this 
cluster. 

A logico-semantic analysis of the other foils might reinforce 

this classification. 

Item 18 : From the description of the incident, we can conclude 
that the teacher's handling of the incident was : 

18D^ Good: she intervened to prevent a serious conflict from 
c on t j nu i ng . ( Sub ) 

The reading selection states: "There seemed to be confusion in 
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this group so I decided to investigate." The similarity between this 
statement and the phrase "intervene J to prevent" in the foil is self- 
evident. WW would seem to be a reasonable alternative classification 
for this foil. 

Item 6 : The purpose of developing a. generalized intellectual 
awareness is to: 

oD Give the individual an ever-widening view of his world. 
3 (irr) 

The similarity between the phrase "ever-widening view" in this 

foil and the phrase "free-ranging understanding" in the reading 

selection warranted the use of the alternative class of WW for this foil. 

Item 2 : Which of the following i.s the most important causative 
factor of contempt for stupidity. 

2D Compulsory written examinations. (OS) 

The phrase "compulsory written examinations" occurs in both this 

foil and the reading selection. 

Item 16 ; If the teacher had written "Doesn't work well with 

others," as an anecdotal record for the above incident, 
this would 'nave been: 

16D~ Worse: Teachers are failing in their obligations in not 
supplying complete information. (CM) 

There is no similar connection in foil 16D^ between the stem or 
the reading selections to the ones presented above. The above discus- 
sion, nonetheless, in general supports the retention of WW as a 
reasonable interpretation for this wrong answer cluster. 

Since, as shown in Table 63 (see: p. 235) both of these foils 
(19D and 2kD ) are classified as 0 and both come from the same content, 
there may be some doubt about the interpretation of this cluster. 

One obvious course of action with this foil cluster would be to 
drop it from further analysis. On the other hand, the foil classes in 



TABLE 63 

INFORMATION FOE THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W, „ 





Foil 


Con cent 


Advance 
Classification 


D 
n 


19DL, 


Progress 


0 


.09 


2h \ 


Progress 


0 


.O' 7 



the Guidelines made no pretence at being exhaustive. Since the 
experience with other foil clusters has been that logico- semantic 
analysis has often revealed a possible common ba.se for interpreting 
foils within a particular cluster, such an analysis for this cluster may 
assist in the illumination as to how the list of Guidelines might be 
extended. For this reason, these foils were also analysed. 

To begin with, the fact that they stood separate from any of the 
interpreted categories suggests that these two foils may form a foil 
class, the basis of which has not yet been determined, but possibly 
related to the special format of items 19 to 2'l inclusive. 

Items 19 to ?M inclusive used a classification protocol designed 
to get the examinees to treat the statements represented by these items 
as hypotheses to be tested on the basis of the information in the reading 
selection from Prescott (1957) concerning Chester's Progress Report. The 
categories were: 

1. supported. 
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2. implied. 

3. refuted. 

k. insufficient evidence,, 

In these categories there is a hierarchy of inferential support 
from insufficient evidence, to implied, to support,, 

The "correct" answer from item 19 is "implied" and the 

corresponding answer given in 191^ was "supported." Similarly, the 

"correct" answer for item kk was "insufficient evidence" and the 

corresponding answer in this cluster was "implied." Each answer given 

in these foils v/as one step higher in the hierarchy than the "correct" 

answer. This "overstatement" was not the same as "overgenerali zation" 

as defined in this study o Whether or not such a relationship is 

exclusive to this type of question remains undetermined. Rather than 

premature naming, tiie 0 (Other) classification of this cluster v/as 

retained. Since another cluster was interpreted as 0 (i.e. W, „ ) , a 

1 J 

subscript was applied for purposes of distinguishing between these two 
clusters . 

Once again, Cluster as given in Table 6k (see: p. 1 56) is 

an 0 classification based on the advance foil classification. The 

1 Ogi C Q~ 3 emantic analysis of this cluster proved to be very interesting. 

A slight change of format will be used in this discussion with all 

three items being presented before the discussion of them as a cluster. 

Item 21 : Chester is growing very slowly and is quite immature 
for his grade. Everyone expects too much of him. 

21D The hypothesis is supported hy the facts „ (0) 

Item 26 : In Chester's progress report, which one of the 

following is the most important factor contributing to 
his difficulty with school achievement. 

28D His weakness in reading which is affecting all areas of 
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1 earning o (OG) 

It em 23 : Chester' a reading deficiency has not yet begun to 
affect seriously his performance in other areas. 

23IU The hypothesis is refuted by the facts. (0) 



TABLE 6k 



INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W 





Foil 


Content 


Advance _^ 
Classification n 


21D 
28D 2 

23f) (J 


Progress 
Progress 
Progress 


0 ,07 
OG .32 
0 .17 



As it happens, the hypothesis in item 23 is supported rather 
than refuted by the facts. Notice, however, that this same contrary- 
to-fact conclusion is stated in foil 28D,-, and implied in the response 
to item 21. The positive statement of this false conclusion was most 
frequently selected, (i.e. 28D£ for which the B was ,32) c The negative 
statement (2^^) was less frequently selected (B^ = »17) and the implied 
statement (21D ) least frequent (i)^ = »0?). This would seem to be 
reasonable., Also, this cluster holds together better than any other in 
cross validation (two of the three form a new three-member cluster). 



It would oeeci, then, that this cluster is content bound, not so 
much on the basis of a common reading selection, but rather upon a single 
common contrary- to- ('act conclusion formed by some of the examinees. 

The 0 (Other) designation was therefore, retained making this 
wrong answer cluster 0 ? . Table 65 giving information on Cluster 
follows . 



TABLE 65 

INFORMATION FOR THE INTERPRETATION 
OF WRONG ANSWER CLUSTER W, ,, 





Foil 


Con Lent 


Advance 
Classi f i oa Li on 


D 

n 



3D Stupidity Irr 0 23 

25D Progress Irr .19 



9T>2 Aggression Sub .1? 

27D Progress Irr .09 



Wrong answer Cluster W ^ contained Irr foils in three out of 
four cases. Foil 9D cannot be reclassified as an Irr foil since it is 
not a true statement. However, the Irr classification of this cluster 
was retained on the basis of the decision rules already discussed 
(see: p. 23?)° Table 66 giving information for the interpretation of 
wrong answer cluster W follows (see: p. 239) • 
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TABLE 66 

INFORMATION FOR THE INTERPRETATION 
OP WRONG ANSWER CLUSTER W.. r 





Foil 


Content 


Advance 
Classification 


D 
n 


30D 2 


Summary of all 
S©1 e ot i on s 


Tr 


.15 


24D„ 


Progress 


0 


oi)6 



Wrong answer cluster W, r , as given in Table 66, could not be 

15 

interpreted without logico- semantic analysis because it did not contain 
a most frequent category by the advance foil classification. A rea- 
sonable approach would be to investigate the possibility that 2'4D„ might 
also be classified as Tr. 

I tem 2k : Chester's mother has kept after him about his reading 
until he hates it. 

2^D^ The hypothesis is refuted by the facts. 

Several facts can be derived from Chester's "Progress Report" 

which have a bearing on Item 2k. The three most important are: 

1. "Does not enjoy reading." 

2. "Has a wide speaking vocabulary." 

3. "Enjoys spelling." 

From these three facts two conclusions can be drawn: 

l) Chester's background must be fairly verbal, hence his 

reading deficiency is probably not the direct product of a 



background di s advantage . 
2) Since ho likes spelling and lie "is better able to find facts 

than to interpret facts," his reading problem would seem to 

be a decoding problems, 
In addition, the report gives no direct evidence about Chester's 
home background,, Hence the best answer for 1 this question is "insuffi- 
cient evidence o" In order to have the proposition presented in Item 2^- 
refuted by the facts more would have to be known about the probable 
sources of the decoding problems and the side speaking vocabulary. Only 
if these two considerations were emphasized beyond reason could the 
refutation be acceptable e This conclusion might be expected to have 
arisen, then, from a. reordering of the emphasis given to parts of the 
reading selection vis a vis other parts of the selection. The classifi- 
cation of PM-T'^ as Tr would seem to bo appropriate in this case, giving 
the entire cluster this same interpretation. 



